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ABSTRACT 


This project details the development of a novel and practical framework for the 
development of conversation agents (CAs), or conversation robots. CAs, are software 
programs which can be used to provide a natural interface between human and computers. 
In this study, ‘conversation’ refers to real-time dialogue exchange between human and 
machine which may range from web chatting to “on-the-go” conversation through mobile 
devices. In essence, the project proposes a “smart and effective” communication 
technology where an autonomous agent is able to carry out simulated human conversation 
via multiple channels. The CA developed in this project is termed “Artificial Intelligence 
Natural-language Identity” (AINI) and AINI is used to illustrate the implementation and 
testing carried out in this project. Up to now, most CAs have been developed with a short 
term objective to serve as tools to convince users that they are talking with real humans as 
in the case of the Turing Test. The traditional designs have mainly relied on ad-hoc 
approach and hand-crafted domain knowledge. Such approaches make it difficult for a fully 
integrated system to be developed and modified for other domain applications and tasks. 
The proposed framework in this thesis addresses such limitations. Overcoming the 
weaknesses of previous systems have been the key challenges in this study. The research in 
this study has provided a better understanding of the system requirements and the 
development of a systematic approach for the construction of intelligent CAs based on 
agent architecture using a modular A^-tiered approach. This study demonstrates an effective 
implementation and exploration of the new paradigm of Computer Mediated Conversation 
(CMC) through CAs. The most significant aspect of the proposed framework is its ability 
to re-use and encapsulate expertise such as domain knowledge, natural language query and 
human-computer interface through plug-in components. As a result, the developer does not 
need to change the framework implementation for different applications. This proposed 
system provides interoperability among heterogeneous systems and it has the flexibility to 



be adapted for other languages, interface designs and domain applications. A modular 
design of knowledge representation facilitates the creation of the CA knowledge bases. 
This enables easier integration of open-domain and domain-specific knowledge with the 
ability to provide answers for broader queries. In order to build the knowledge base for the 
CAs, this study has also proposed a mechanism to gather information from commonsense 
collaborative knowledge and online web documents. The proposed Automated Knowledge 
Extraction Agent (AKEA) has been used for the extraction of unstructured knowledge from 
the Web. On the other hand, it is also realised that it is important to establish the 
trustworthiness of the sources of information. This thesis introduces a Web Knowledge 
Trust Model (WKTM) to establish the trustworthiness of the sources. 

In order to assess the proposed framework, relevant tools and application modules have 
been developed and an evaluation of their effectiveness has been carried out to validate the 
performance and accuracy of the system. Both laboratory and public experiments with 
on li ne users in real-time have been carried out. The results have shown that the proposed 
system is effective. In addition, it has been demonstrated that the CA could be implemented 
on the Web, mobile services and Instant Messaging (IM). In the real-time human-machine 
conversation experiment, it was shown that AINI is able to carry out conversations with 
human users by providing spontaneous interaction in an unconstrained setting. The study 
observed that AINI and humans share common properties in linguistic features and 
paralinguistic cues. These human-computer interactions have been analysed and 
contributed to the understanding of how the users interact with CAs. Such knowledge is 
also useful for the development of conversation systems utilising the commonalities found 
in these interactions. While AINI is found having difficulties in responding to some forms 
of paralinguistic cues, this could lead to research directions for further work to improve the 
CA performance in the future. 
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CONTRIBUTIONS OF THE THESIS 


The contributions in this thesis which have been published and reported are described 
below and summarised in Table 1.1. 

A survey and review of various techniques in the development of CA systems has been 
completed. The work has been published in paper P8. Conference paper P8 was later 
extended to journal paper J5, which has been described in Chapter 2. This is a paper on the 
state-of-the-art development of the discipline. The paper presents the results from the initial 
literature study on conversation systems and how evaluation has been conducted with 
respect to the “naturalness” and “humanness” of the human-machine conversation as 
required in Turing Test (TT). 

The development of the new CA framework design forms a part of Chapter 3. The work 
has been reported in papers P10, P13 and P14. These three conference papers have been 
extended to journal papers J7, J10 and Jll respectively. Paper J10 was a keynote address 
presented at the International Conference on eLearning for Knowledge-based Society 2005. 
In addition, papers P5 and P10 have also received the Best Paper Awards at the 
International Conference on Internet Computing and Web Services in 2007, and the 
International MultiConference of Engineers and Computer Scientists in 2006 respectively. 
Papers J6 and J12 described the contribution of the applicability and adaptability of the 
AINI’s framework in terms of specific domains relevant to the SARS epidemic and bird flu 
pandemic. 

During the writing of papers P6, P7 and Pll, it became obvious that the publicly available 
Google API (Application Programming Interface) and Google PageRank have great 
potential in identifying unbiased seeds and corpora for building the CAs’ knowledge bases. 
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Papers J4 and J5 described the experiments with Google API and Google PageRank as the 
main sources from which trustworthy CAs’ knowledge bases were established. Paper 
journal J5 was published in the special issue on Intelligent Web Interaction , extended from 
P7 and it showed that Google API can be used to simplify the information discovery 
process. This paper proposed the Web Knowledge Trust Model (WKTM) to determine the 
trustworthiness of relevant sources from the Web. Paper P15 revealed a novel approach, the 
Automated Knowledge base Extraction Agent (AKEA), and this constitutes the core 
contribution described in Chapter 4. This paper was also extended to book chapter B2. 

The contribution in Chapter 5 is the establishment of a baseline for evaluating CAs in 
comparison to other query systems such as search engines, question-answering systems and 
conversation systems. The comparison was based on qualitative and quantitative 
approaches, and it also gave an insight into the performance of the natural language parsers. 
Paper P13 was a report from evaluating the quality of the query systems. This approach can 
be used as a benchmark for evaluating new systems in other domains. Paper P13 was 
subsequently extended to journal paper J9. 

Chapter 6 and 7 complete the research work with an evaluation of the real-time human- 
machine interaction and the finding have been reported in papers PI to P5, J1 to J4 and Bl. 
These papers described the rationale and the results of public real-time experiment 
evaluation based on unconstrained domain and unrestricted duration. The empirical 
approach was based on the analysis of a number of conversation logs collected from 
human-machine interaction via MSN Messenger. The analyses include an extensive 
account of observed dialogue phenomena, which include linguistic features and 
paralinguistic cues of the human-machine utterances, as well as the topics of interest. 
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Table 1.1: Summary of the Contribution of the Thesis 


CHAPTER 

CONTRIBUTIONS 

j PAPER NO | 

Background 

Literature survey on previous research work from classical 
CAs, Loebner Prize CAs to commercial CAs. 

P8 

Conversation Agents 
Framework Design 

Proposal of a modified /V-tiered architecture that provides 
reusable, extensible, scalable, and modular (RESM) design 
for heterogeneous CAs framework. 

P14, J12 


AGENT BRAIN (Application Server Tier) 

The development of a novel top-down multi-level natural 
language query approach. 

P10, P12, P16, J17 


AGENT KNOWLEDGE (Data Server Tier) 

This thesis introduces a Web Knowledge Trust Model 
(WKTM) to establish Conversational Agents knowledge 
which consists of Open-domain and Domain-specific 
knowledge base. The main contribution of this model is the 
proposal and development of a Domain-specific knowledge 
from trustworthiness online documents using Automated 
Knowledge Extraction Agent (AKEA). 

P6, P7, PI 1 


AGENT BODY (Client Tier) 

Proposal and development of a multiple-channel 
communication approach for greater CAs autonomy. 

P9, J6, J8, J10 

An Assessment of the 
Trustworthiness of 
Knowledge Bases for 
Conversation Agents 

Through Google API, Google PageRank and Web 

Credibility, the World Wide Web is used as the main 
resource to find and extract trustworthy web pages using the 
proposed Web Knowledge Trust Model (WKTM). 

Automated Knowledge Extraction Agent (AKEA) is used to 
retrieve and dynamically construct trusted Web knowledge 
from semi-structured data. 

P3, P15, J2, J4, 

J5, B2 

An Evaluation of the 
Conversation Agent 
Framework 

Short-term lab-based and controlled experiments are used to 
verify the proposed framework design. The evaluation 
demonstrated possible solutions to evaluate the quantitative 
performance and accuracy of the parsers; and response 
quality of the AINI conversation system. 

P2, P13, P17, J9, 

Jll 

An Analysis of the 
Linguistic Features 
from Real-time 
Human-Machine 
Interaction 

VisualChat tools have been developed to visualise the 
linguistic features and paralinguistic cues of the conversation 
between human and CAs in the real-time experiment. Results 
from the experiment showed that human and machines can 
communicate better in unrestricted domain, without a time 
limit and unconstraint setting. 

P1,P4, P5, Jl.Bl 

An Analysis of the 
Paralinguistic Cues 
from Real-time 
Human-Machine 
Interaction 

The study also observed that human participants or AINI’s 
buddies expressed their ideas and feeling through 
paralinguistic cues in the IM environments. By incorporating 
this feature, AINI is providing better and human-like 
conversations with the users. 

J3 
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CHAPTER 1 


INTRODUCTION 


1.1 Overview 

Communication technologies have advanced greatly since the invention of the telegraph 
in 1794 1 . They have played important roles in connecting people and exchange of 
information with one another. In particular, fixed lines or mobile telephone systems 
have allowed users to carry out real-time conversation anytime and almost anywhere. 
However, advancements in computer and Internet technologies have taken the meaning 
of ‘conversation’ to a new dimension. Conversation is no longer limited to exchanges 
between human and human, but it can now describe communication between humans 
and computers. A software application that is capable of simulating human conversation 
is known as a conversation agent (CA). Providing CAs with knowledge and intelligence 
has allowed them to be used in several practical applications. For example, a CA is 
being advertised as a digital friend called Virtual Personalities by Verbot 2 . Other 
examples of applications for CAs are: therapists, counselors, teachers, trainers, 
salespersons or corporate representatives; such as those were being employed by 
companies are Spleak 3 , Extempo 4 , Artificial Life 5 , Kiwilogic 6 , and Oddcast Inc 7 . 


Recently, there have been signs of great interest in the development of CAs driven by 
the needs of commercial applications. Many e-service providers are motivated to 


1 htlp://inventors. about, corn/ od/tstartinventions/a/telegraph.htm 

2 http://www.verbots.com 

3 http://www.spleak.com 

4 http://www.extempo.com 

5 http://www.artificial-life.com 

6 http://www.kiwilogic.com 

7 http://www.oddcast.com 
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incorporate natural language interfaces in their websites in order to improve customer 
relationships, resulting in an increase of sales. The agents used are human-like avatars 
acting as representatives for their respective companies. However, these agents are only 
focused on the interface to attract the viewers; rather than maintaining an ongoing 
interaction with the users. 

Previous research works (discussed in Section 2.4), from classical CAs, through CAs in 
the Loebner Prize competition, to commercial CAs, have invented many subsets of 
conversation systems. These systems have all been developed for specific purposes, 
such as artificial intelligence (AI) research, to take part in the competition or for 
commercial use. For example, the development of Loebner Prize CAs could have been 
stimulated by the prize money of up to $100,000 offered to the developers of the first 
system which can pass the Turing Test (TT) [5], In general, this solution approach was 
custom-designed for the specific problem, and it is non-generalised. Such a solution 
may not be easily adapted to other purposes. 

The primary aim of this study is to propose a framework for the development of 
reusable, extensible, scalable, and modular design CAs called AINI (Artificial 
Intelligent Natural-language Identity). The developed framework is based on agent 
architecture using a modular A-tiered approach. Under this architecture, different CAs 
handling different domains can be developed independently, and cooperate with one 
another to respond to the user’s requests. Our CA’s framework is primarily a client- 
server architecture that is capable of interacting with a user via text, speech and 
animated avatars to present information in a web browser environment, mobile service 
or computer-mediated communication (CMC) application. The proposed framework is 
not only limited by the communication channels stated above, it could also be extended 
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for use in future devices with an appropriate interface. The proposed new framework 
gives AINI-enabled applications the ability to understand and react to users based on 
user interactions, and to extract meaning from free-form text input. 

This thesis addresses several challenges that arise in the development of practical CAs. 
It can be expected that AINI conversation agent will exist in many applications in the 
future. AINI will not just interact with humans, but will act as a prominent channel or 
media of communication [10, 11]. These interfaces are rapidly becoming embodied in 
the human-computer interface (HCI). The embodiment allows interfaces to share verbal 
and non-verbal information such as linguistic features and paralinguistic cues. The 
proposed framework contributes to knowledge and popularisation of the use of CAs, not 
only on the web but also on mobile services and CMC applications. The understanding 
of unbounded natural language has been one of the challenges in this work. From this 
study, it has been observed that there are aspects of human-machine conversation, such 
as linguistic features and paralinguistic cues, which could be used to trace potential 
problems in the human communication. This could be used to improve the performance 
of the AINI conversation agent. 

Within the proposed architecture, the system could be defined as an N-tiered design. 
The modules in the “Application Server Tier” execute natural language understanding 
and reasoning skills. In the thesis, it is called the agent’s brain. The second tier, or the 
“Application Server Tier” where the large-scale knowledge bases reside, is called the 
agent’s knowledge. The final tier is the “Client Tier”, which is also called the agent’s 
body. In this thesis, this tier serves as the interface or communication channel for the 
AINI. In the proposed development, the AINI endeavours to leverage the breadth and 
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power of human conversational competency by using linguistic properties and 
paralanguage characteristics to enhance and prolong the conversations. 

In order to meet the goal of developing a practical CA, the process has involved 
knowledge and understanding of human-machine protocols from multiple disciplines 
such as Computational Linguistic (CL), Natural Language Processing (NLP) and 
Human-Computer Interface (HCI). This has led to the development of a Domain 
Knowledge Matrix Model (DKMM), a Web Knowledge Trust Model (WKTM) and an 
Automated Knowledge Extraction Agent (AKEA) in the agent’s knowledge. The study 
proposed a multilevel independent natural-language query solution in the agent’s brain, 
which is based on the multiple knowledge modules. The agent’s body employs 
multimodal interface channels that could be installed as “plug-in” modules. This 
capability is a key idea and contribution to the developed solution. This modular 
approach enables the system to be easily customised to other domain applications and 
tasks. 

Instead of building a complex CA from scratch, appropriate modules from existing 
systems or those built within the project were integrated based on the proposed 
framework. This approach is a constructive design that incurs progressive development, 
analysis and testing of its interaction ability with humans and comparison with other 
systems in the laboratory. 

Finally, an evaluation of the AINI framework design has been carried out by testing 
how humans interact with them in the public environment through instant messaging 
(IM). In this experiment, participants were able to express themselves freely without 
any restrictions, such as the time limit of five minutes in the earlier Turing Test, or ten 
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minutes in the Loebner Prize. In addition, the conversation is not restricted to any 
particular domain or applications. To perform empirical studies of how users behave 
and interact with these experimental systems, human participants participated in an 
unconstrained and natural conversation setting by “chatting” directly through IM. The 
purpose of the studies has been to increase our understanding of how AINI can utilise 
the regularities found in human-computer interaction. This will help in the design and 
the collection of unbiased user expressions in natural language from the users. The 
process of developing the practical AINI framework is illustrated in Figure 1.1. 



Figure 1.1: Conversation Agents Development Process 

The framework and models developed in this work will benefit the fields of Human- 
Computer Interface (HCI), Natural Language Processing (NLP), Information Retrieval 


5 













(IR), and Artificial Intelligence (AI), and pave the way for the next generation of 
intelligent computers which can interact with humans more naturally and effectively. 

1.2 Ethical Considerations 

CAs, as with any other technology, can be abused [12] and could become a significant 
source of manipulation and control over individuals. For this reason, this research 
sought advice and clearance from the university ethics committee. As a result, 
participants involved in the CAs evaluation methodology were issued with consent 
forms before becoming involved in the study. 

There are those who also feel that any anthropomorphic interface is unethical, because it 
unrealistically raises users’ expectations. Therefore, in the first interaction, AINI will 
send an alert message, identifying “her” as a conversation agent with limited 
capabilities. Such language can help users properly adjust their expectations at the start 
of an interaction. 

In other words, AINI is not programmed to prompt the user in any way for any 
particular type of information. AINI was designed to mimic the questions asked by the 
participants and will not take initiative to ask any specific questions on any subject 
matter. However, AINI is able to initiate the conversation by generated dynamic 
questions or some form of distraction routine. When AINI is unable to provide an 
answer to a participant's question, instead of answering "7 don't understand." or "/ can't 
follow you”, AINI will use one of the dynamic responses or distraction routines to ask 
questions of the participant. In addition, AINI is also equipped with questions that only 
allow for a limited number of responses. This will prepare AINI for an appropriate 
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answer. For example, instead of asking "What is the meaning of life?", AINI will ask 
"Do you prefer books or TV?”. 

Stringent guidelines will be used to ensure participants’ confidentiality. There are no 
questionnaires or surveys given out. The participants of this study will be given a guide 
on how to invite AINI into their MSN contact list as shown in Appendix A. The data 
collected during evaluation will be stored securely for a period of time in accordance 
with Murdoch University Guidelines from the Human Research Ethics Committee 
(HREC) 8 . The evaluation projects must ensure that they are fully in line with the ethical 
research policies and requirements of the HREC. 


1.3 Delimitations of the Thesis 

This research will not attempt to do the following: 

• Use all possible storage formats of existing knowledge. The input 
knowledge base will be restricted to XML, or to formats easily 
transformed to XML. 

• Although AINI’s framework supports multiple languages, conversations 
will be limited to English in this study. 

• Update the input knowledge source. It is assumed that the knowledge 
source will be sufficient to cover all the necessary requirements for the 
specific applications of interest. 

• The research proposed here will concentrate on typed input, and will not 
deal directly with speech recognition as the user input. 

• This thesis will not attempt to fully automate the conversion from web 
documents to CA's knowledge bases. (The human expert will have to be 
involved in many cases, especially in the evaluation of the trustworthiness 
of websites). 

• The AINI framework is not designed to cater for all users. Instead, focus is 
directed to users aged 18 and over. However, no-one has been excluded 

8 http://www.research.murdoch.edu.aii/ethics/ 
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from participation in the evaluation phase based on their culture, gender, or 
location. 


1.4 Organisation of the Thesis 

This chapter has presented the introduction and overview of the research. As a 
background to this thesis, Chapter 2 introduces related past research works and 
perspectives on CAs, which were the motivation behind this research into the 
understanding and design of CAs in human-machine communication. 

Chapter 3 proposes a development methodology and a series of explorations in the 
practical framework design and software architecture of CA, called AINI. The primary 
aim of the proposed frameworks was building a reusable, extensible, scalable and 
modular CA. This methodology is based on our constructive approach using modified 
A-tiered architecture similar to Service-Oriented Application architecture which 
addresses the extensive variability that is encountered in today’s CA framework. The 
software architecture presented in this chapter demonstrates a novel approach, 
extensible through polymorphic inheritance of components, and a modular design. This 
solution is an architecture that provides a modular design and this multi-module makes 
the solution easily extensible and could be customised to any particular application 
domain. This chapter addresses the fundamental architecture issues. The user interface 
(Agent Body) may employ several input and output modalities or protocol, allowing the 
user to choose which one to use for interaction. It is well understood that true 
intelligent action requires large quantities of knowledge. Such a reservoir of knowledge 
can be harvested from the Internet or extracted from existing training dataset knowledge 
bases (Agent Knowledge). AINI’s query engine (Agent Brain) implements its decision 
making network based on the information it encounters in the six levels of multi-level 
natural language query modules. This AINI architecture forms a basis for the 
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construction of the intelligent conversation system to provide appropriate answers to the 


user. 

Chapter 4 is novel within the fields of information retrieval and computational 
linguistics. In this chapter, a Web Knowledge Trust Model (WKTM) and Automated 
Knowledge Extraction Agents (AKEA) have been designed to take an advantage of the 
existence of the document from the World Wide Web. Here, this study proposes an 
effective cooperation evaluation (of unbiased seeds and corpora) in the interests of 
promoting the trustworthy websites to be extracted into AINI’s domain-specific 
knowledge base, and then verifying the mechanism. 

Chapter 5 discusses a controlled experiment in the lab based on quantitative and 
qualitative approaches. Three short-term lab controlled evaluations were carried out to 
verify the framework design of the AINI architecture. These evaluations are intended to 
quantify the performance and accuracy of the parsers, and the quality of the responses 
as compared to three other query systems. 

Chapter 6 details an analysis of a human-machine interaction corpus collected from 
real-time public environment. This real-life experiment is aimed at increasing our 
understanding of how people interact with computers, and to obtain knowledge on how 
AINI can utilise the regularities found in these interactions. Two analyses have been 
conducted based on this corpus. The first analysis is about language or linguistic 
features of the conversation between human-machine presented in the Chapter 6. The 
second analysis presented in Chapter 7 is about the correlation between non-language, 
or paralinguistic cues made by the AINI and human. The results from these experiments 
demonstrated that communicators (human or machines) should be equipped with vast 
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bodies of knowledge in order to carry out meaningful conversation under different 
domains. Over time, CA could become more efficient by using more human-like spoken 
language and phrases. The results of these experiments also showed important findings 
on long-term relationships and effects on human-computer communication technology. 

Finally, Chapter 8 presents the thesis conclusion which includes summary, contributions, 
limitations and a discussion of future directions. Future works toward the development 
of successful companion conversation robots (CCRs) are discussed. This study showed 
that the proposed solution framework can also be made naturally expendable and 
adaptable into physical robots; of types that can also communicate with humans, but 
have not been addressed by previous works. 
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CHAPTER 2 


BACKGROUND 


2.1 Introduction 

A concept of intelligent conversation systems is needed to develop practical 
conversation agent (CA) framework designs. CAs (which may also be called 
conversation robots, chatterbots, or simply, ‘hots’) are computer programs or 
application software designed to simulate an intelligent conversation with one or more 
human users in natural language. In this context, the Turing Test (TT) is the well 
established instrument for assessing the performance of CAs. The properties of the TT 
and the descriptions for a range of CAs are outlined in this chapter. 

A number of different types of CAs have been developed over time; ranging from 
classical CAs, those developed for the Loebner Prize competition, to commercial CAs. 
However, until the time of writing, none of these CAs has been able to successfully pass 
the TT which has a time limit of five minutes, or the Loebner competition which set the 
duration to ten minutes. This chapter addresses the fact that a program with a limited 
handcrafted restricted knowledge base, simple pattern recognition technique and a few 
linguistic or AI tricks will be unable to hold a long term conversation with a human. In 
addition, most of the current CAs developed are based on ad-hoc design, and were 
created more from an interest in passing the TT than for use in any real-life application. 
Therefore, this thesis addresses the above challenge and introduces a practical 
framework as described in Chapter 3 for the development of CAs aiming to overcome 
the restrictions. 
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2.2 Artificial Intelligence 


Artificial Intelligence (AI) is a branch of computer science that aims to produce or 
exhibit "intelligent" thought or behaviour by a computer or a machine. This field has 
close ties to psychology, philosophy, and cognitive science. According to Russell and 
Peter Norvig [13], AI systems can be divided into four categories as shown in Figure 2.1: 


“The study of mental 
faculties through the use 
of computational models" 
[4], "The study of the 
computations that makes 
it possible to perceive, 
reason, and act" [6], 


"The exciting new effort 
to make computers 
think ... machines with 
minds, in the full and 
literal sense" [2] "The 
automation of activities 
that we associate with 
human thinking, activities 
such as decision-making, 
problem-solving, 
learning ... “[7], 



"A field of study that 
seeks to explain and 
emulate intelligent 
behaviour in terms of 
computational 
processes" [3] "The 
branch of computer 
science that is 
concerned with the 
automation of intelligent 
behaviour" [8], 


"The art of creating 
machines that performs 
functions that require 
intelligence when 
performed by people" 
f 1], "The study of how 
to make computer s do 
things which, at the 
moment, people are 
better"[5], 


Figure 2.1: Artificial Intelligence Categories [9] 


Historically, there were many important social and technological events that led up to 
the modern ideas about Applied AI. Comprehensive listings and timelines chronicling 
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major AI milestones can be found elsewhere in research, and are summarised 16 in 
Figure 2.2. 


First 
electronic 
Computer 


Turing Test 


Start of DoD's 
Advanced Research 


Birth of AI 


Al-based hardware sell Tim Berners-Lee 
$420 million to published Web Semantic 

companies Web Road Map 

First Expert system 


__ AI System beats 
human chessmaster 

-F- 


ASIMO Robot walk as 
fast asihuman 


1941 1950 1956 1958 1963 1968 1970 1972 1986 1991 1997 1998 1999 2005 2006 


4 


First commercial stored 
program computer 

Logic Theorist 
Developed 


■Ljs'p. language 
developed 


MicrOworid 
program, 
SHRDLU created 


■ PFJOUpG language 

' rfevealed 


r 

Deep-Blue chess 
machine (IBM) beats the 
woifd'chpss champion 
AI military system used 
effectively in DESERT 
STORM 


SPARQL, a query 
language for 
Semantic Web 


Sony introduces 
intelligent “pets” AIBO 


Figure 2.2: Artificial Intelligence Timelines 


Human beings have long been curious about how the mind works and fascinated by 
intelligent machines. However, it wasn't until the post-war period (1945-1956) that AI 
emerged as a widely discussed field. The genesis of AI was propelled by the arrival of 
modern computer technology and the arising of a critical mass, which included pioneers 
such as Allen Newell and Herbert Simon, who founded the first AI laboratory at 
Carnegie-Mellon University, and McCarthy and Marvin Minsky, who founded the MIT 
AI Lab in 1959. The development of modern computer technology affected AI research 
tremendously. Developments of particular interest as far as the birth of AI is concerned 
came in the mid 20 th century. In 1950, Turing proposed a paradigm to establish whether 
or not a machine is intelligent in his well-known article, “Computing Machinery and 
Intelligence” [13]. 

2.3 Turing Test and Loebner Prize 

Alan Turing was a brilliant British mathematician who played a great role in the 
development of the computer and posed one of the most famous challenges in Computer 
Science. The Turing Test (TT) [13], one of his most enduring contributions, is a simple 

16 http://biology.kenvon.edu/slonc/bio3/AI/TIMELINE/timeline.html 
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test he proposed in 1950 that remains one of the most debated issues in the world of AI. 
Turing came up with an interesting philosophy at that time. He tested the simple 
proposition that if human beings are intelligent, and if a machine can imitate a human, 
then the machine would have been considered intelligent. This test is also known as the 
Imitation Game (IG). 

In the TT for machine intelligence, an observer has to distinguish between the machine 
and a human by asking a series of questions through a computer link. This situation is 
depicted in Figure 2.3. 



On the TT, Turing [13] believed that: 

.. about fifty years' time it will be possible to programme computers, with a 
storage capacity of about 10 9 , to make them play the imitation game so well that 
an average interrogator will not have more than 70 percent chance of making the 
right identification after five minutes of questioning. ... I believe that at the end of 
the century the use of words and general educated opinion will have altered so 
much that one will be able to speak of machines thinking without expecting to be 
contradicted. ” 


Based on the TT, a computer program is said to be "intelligent" if (and only if) it "fools" 
a human into believing the computer is also a human. It has been more than fifty five 
years since Turing envisioned this and the storage capacities of today’s standard 


The illustration has been modified from Mike Yates at http://plus. maths, org/is sue5/turing/ 
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computers largely exceed the 1 GB he required. Processor speed has also increased by 
several hundred times since the 1980s, while storage capacity has increased even more 
rapidly as shown in Table 2.1. With the exception of industrial strength servers, today’s 
computers are small and could be globally networked. Even Microsoft founder Bill 
Gates did not expect this rapid increase [14]. However, no artificial system that is 
generally believed to be intelligent has been built to date, though there are many fields 
in science with exactly this goal: building and understanding intelligent systems. To 
date, no computer program has been able to pass as a human with a frequency as 
successful as Turing predicted. Many years of unsuccessful attempts show that this is 
not an easy task. It can be concluded that up to now the TT has not yet been passed - 
which brought some arguments against TT. 


Table 2.1: Advances in Computer Technology 


Year 

RAM 

Processor 

Bus 

Modem 

Storage 

1980 

2007 

1K-64K 

1GB - 10GB 

3.25 MHz 
2.5GHz 

8-bits 

64-bits 18 

300 bps 

10 Mega bps 

Cassette - 1K 
> 250 G 


There is a long history of argument in philosophical literature opposing the 
appropriateness of the TT as a litmus test of intelligence. Many different objections to 
the TT have surfaced in literature during the past fifty-five years. Variants of this 
argument have been given by Block [15], French [16], and Searle [17]. 

In addition, according to Stuart M. Shieber [18], Turing’s view of the TT as being 
statistical in nature and his pragmatic orientation toward its efficacy are of a piece with 
its status as an interactive rather than classical proof. Many AI researchers argue that 
animals, which are simpler than humans, ought to be considerably easier to mimic. Yet, 
satisfactory computational models for animal intelligence are not available today [19]. 

Even Roger Mason and Cao Li in Rangel [20], researchers at MIT (Massachusetts 

18 

Most 64-bit machines are only being used as game machines. The 64-bit chip has been migrated to the desktop in 
2006. 
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Institute of Technology) have used the TT to determine, by the test’s criteria, that most 
2-year-old children are not human. They have performed the TT on a group of 2-year- 
old children, both male and female. Of the group of 100 children tested, none passed the 
TT. There are many other published arguments about Turing’s paper [21, 22]. These 
arguments raise the issue of whether an operational definition of intelligence is 
appropriate, and whether the particular definition codified in the TT is too narrow. 
Hence, Loebner Prize was introduced to look at the issue from a behavourist’s 
viewpoint. 

The Loebner Prize Competition [23] originated from the Cambridge Center for 
Behavioral Studies, held annually since 1991, is a test similar to the TT that offers a 
monetary award for the first computer which is capable to provide responses which are 
indistinguishable from a human. Each year an annual prize of $2000 and a bronze medal 
is awarded to the computer with the most human-like responses. Most of the winners 
have been based on elaborate ELIZA-like programs [24] (see 2.4.1.1). The Loebner 
Competition has been critically analysed by computational linguist, Stuart Shieber [18]. 
Shielber has argued that the competition, unlike other competitions for professional 
computational linguists, has not fostered research and development, but merely 
encouraged unintelligent, ELIZA-like CA programs whose sole goal is to fool the 
judges [25, 26]. Loebner’s reply to this argument emphasised the focus is on 
behavourists’ aspects instead of language. This argument can be found in [27]. 

It could be assumed that Turing would have been disappointed by the state of play at the 
end of the twentieth century [28]. Computer programs submitted to the Loebner Prize 
Competition are nowhere near the standard that Turing envisaged. It was claimed that 
these programs are designed solely with the aim of winning the prize of the Best 
Competitor for the year, with no thought that the embodied strategies would actually 
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yield something capable of passing the TT [25]. Peter Ross of Edinburgh University [29] 
stated that AI is not about trying to produce artificial brains or humans, or even trying to 
just get pass the infamous TT. Instead, he considered AI is about creating smart artifacts 
or conversation agents, such as robots or computer programs, and about the scientific 
investigation of aspects of intelligence through modeling and development of practical 
frameworks. This approach to AI is the one adopted and described in this thesis. 

2.4 State-of-the-art Conversation Agent Systems 

Although many CAs appear to be intelligently interpreting the human input prior to 
providing a response, most CAs simply scan for keywords within the input and pull a 
reply with the most matching keywords or the most similar wording pattern from a local 
database. More recently, CAs have emerged that are able to converse about various 
issues and topics [30]. These systems substantiate the concept of Virtual Personality — a 
prerequisite for any machine that would claim to simulate human thought. Projects on 
such agents have simulated Mikhail Gorbachev, Dante Alighieri, Jesus Christ, and John 
Lennon [31]. 

A good understanding of conversation is required to carry on a meaningful dialog, but 
most CAs does not attempt this. Instead, they "converse" by recognising cue words or 
phrases from the human user. This allows the program to use pre-prepared or pre¬ 
calculated responses in a pattern-matching approach. It means that the conversation is 
carried on in an apparently meaningful way without any “knowledge” on what they are 
talking about. 

CAs can be categorised into three groups: Classical CA Systems, CAs in the Loebner 
Prize and Commercial CA systems. Table 2.2 provides a summary of these agents from 
the perspectives of the agent brain, agent knowledge, agent body and their respective 
drawbacks. 
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Table 2.2: Advances in Conversation Agents in the Last Forty Years 


CAs 

Name of CA 

Agent Brain 

Agent 

Knowledge 

Agent 

Body 

Drawbacks 

Examples of 

ELIZA 19 [32], the first, and probably 

Simple pattern 

Limited 

Textual 

ELIZA’s engine is based on Finite State 

Classical 

the most well-known natural 

recognition, 

domain on 


Machines (FSM) and they can become very 

CAs 

language system that has been 
developed. Joseph Weizenbaum 
coded ELIZA at MIT during the 
years 1964-1966. This "friendly" 
program simulates a Rogerian 
psychotherapist. 

based on a 
stimulus- 
response model. 

therapist 
facts and 
rules. 

Original 
ELIZA had 
only about 
200 rules. 


complex and hard to maintain. On the other 
hand the behaviour resulting from too simple an 
FSM can easily become predictable. 

The ultimate barrier ELIZA’s engine is an 
algorithm described below: 

• Search for keywords in the edited entries 

• if a keyword is found copy everything 


following the key word from the user’s 
entry 

• If no keywords are found then generate a 
non-committal response 


PARRY 20 [33, 34] was designed by 
Kenneth Mark Colby through the 
1970s at the Department of 
Computer Science at Stanford 
University. The program was perhaps 
the first to be subject to an actual 
controlled experiment modeled on 
ELIZA, in which psychiatrists were 
given transcripts of electronically 
mediated dialogues with PARRY and 
with actual paranoids and were asked 
to pick out the simulated patient from 
the real person. 


Simple and Limited Textual 

compound domain of 

pattern matching paranoid 
hospital 
patient 
expressions 


• Choosing responses at random did not model 
the human patients' responses. 

• PARRY simulates a paranoid human, and 
tells various stories about the mafia, to 
trick humans into thinking it is a real 
person 


19 

http://www-ai.iis.si/ELIZA/ELIZA.html or http://www.wedesoft.demon.co.uk/eliza/ 

20 http://www.cs.cmu.edu/afs/cs/proiect/ai-repositorv/ai/areas/classics/PARRY/ 
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CAs 

Name of CA 

Agent Brain 

Agent 

Knowledge 

Agent 

Body 

Drawbacks 


ALICE 21 winner of the Loebner 




• Standard AIML uses depth-first search, which 

Example CAs 

Stimulus- 

41,000 

Textual 

does not optimize the result, as the name 

in the 

prize in 2000, 2001, and 2004 has 

response 

categories of 

and 

implies. 

Loebner Prize 

been developed since 1995 by Dr 

architecture 

knowledge. 

animated 

• AIML does not include a spell-check 


Richard Wallace. ALICE is an 
award-winning natural language 
processing CA— a program that 
engages in a conversation with a 
human by applying some 
heuristically pattern matching rules. 
ALICE has nearly 400 times the 
number of stimulus-response 
categories of ELIZA in their AIML 22 

based on pattern 

matching 

template 

in AIML 

avatar 

function, although it would be very simple to 
include this in an efficient manner. 

• The output-template from the category is not 
deterministic, but based on random and not on 
phonology, syntax, semantics or pragmatics. 

• AIML knowledge creation is based on 
handcrafted database of thousands of possible 
conversation gambits and a backward-looking 
log file analysis which is expensive to build. 


Jabberwacky 23 , winner of the 

Purely statistical 

Knowledge 

Textual 

• One problem is caused by people who ignore 


Loebner prize in 2005 and 2006 

method, 

creation 

and 

Jabberwacky changing the subject and 


created by British programmer Rollo 

circumventing 

learn from 

animated 

continuing with their own conversations. 


Carpenter. The system stores all of 
the conversations and user comments 
and attempts to use this information 
to find the most appropriate response. 
The program therefore creates a 
massive database 24 of contextually 
appropriate conversations and 
chooses an appropriate response it 
has learnt from a previous user when 
holding a conversation. 

the need to 
program 
syntactic and 
linguistic rules. 
Learn language 
from the 
interaction 

user 

interaction 

avatar 

• Jabberwacky is also known for being bad 
tempered and rude because the system 
responds with inappropriate comments to 
future users. As well as rapid changes in 
topic. 

• Jabberwacky's utterances are former user 
inputs (leading to many arguments about who 
is the computer and who is the human) 


21 http://www.alicebot.org 

22 ALICE, http://www.alicebot.org/anatomv.html 

23 http://www.iabberwackv.com 

' 4 Till 20 July 2007, Jabberwacky had more than 13 million utterances chatted from online users. 
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Drawbacks 


CAs 


Name of CA Agent Brain Agent Agent 

Knowledge Body 


Examples of 
Commercial 
CAs 


Anna 25 is a digital assistant created 

Word and phrase 

Domain- 

Textual • 

for IKEA by Artificial Solutions and 

pattern 

specific on 

and 

developed using Kiwilogic Lingubot. 

recognition 

Swedish 

animated • 

Anna is a type of programme that 

system that 

furniture 

avatar 

appears on e-commerce Internet sites 

matches pre- 

store, IKEA 


to enhance human-computer 

programmed 



interaction while building brand 

responses based 



awareness and increasing sales. Anna 

on Kiwilogic 



is based on ELBOT, Loebner’s 2003 
bronze runner up and Chatterbox 

Lingubot engine 


• 

2003 winner. It is the underlying 
technology behind text-based 
dialogical query system Anna. 



• 


SPLEAK 25 designed by IMT Labs, 

Natural 

Industry- 

Instant 

originally for use as an add-on for 

Language 

specific 

messaging 

MSN Messenger. It is accessible via 

Pattern 

Knowledge 

(AIM or 

the MSN/Windows Live Messenger 

matching 

Modules on 

MSN) 

network. SPLEAK includes several 

created using 

Game, 


games, and features such as 

BuddyScript 

horoscope, 


horoscopes, homework help and 

SDK based on 

learning 


friend finder. One of the CAs is built 

category, topic 

English, 


and operated by Colloquis Inc, and 
exists as SmarterChild 27 on MSN. 

and question. 

news and 

weather 

forecasts. 



Kiwilogic engine is entirely author driven - by 
the author, Fred Roberts. 

The main personality of ELBOT comprises 
just 1000-2000 lines of handscripted 
dialogues - the rest o f the results are from the 
use of variables and use of the technical tricks 
available from the engine together with some 
psychological tricks. 

Limited knowledge on the world of IKEA 
furniture only 

Based on ALICE'S framework on a “topic” 
and replies are generated on the basis of pre¬ 
constructed responses and templates, like 
ELIZA and ALICE. _ 

BuddyScript Server uses patterns to represent 
recognised user inputs and provides the 
appropriate answers. This approach however 
demonstrates better responses. 

The knowledge domain models are 
handcrafted by particular domain experts and 
the process requires substantial time and 
specialised expertise. 


25 http://www.ikea.coni/ms/en US/customer service/splash.html 

26 http://en. wikipedia. org/wiki/Spleak 

27 

http://smarterchild.colloquis.com 
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2.4.1 Classical Conversation Agent Systems 


2.4.1.1 ELIZA 

Alan Turing passed away in 1954, a decade before computer programs such as ELIZA 
(1966) began to proliferate. It is unfortunate that Turing did not live to see and analyse 
such programs. Although ELIZA was constructed with the aim of passing the TT, many 
researchers found it to be far from achieving this goal. 

ELIZA, written by Joseph Weizenbaum [35], was an AI program that simulated the 
behaviour of a therapist. The first natural language program of this sort was developed 
in 1967 at MIT. Such programs interact with users in simple English language and can 
simulate a human conversation. ELIZA appeared to converse in more or less fluent 
English with its user, apparently simulating a Rogerian psychotherapist. Many 
anecdotes of people were fooled into thinking that they were conversing with a real 
human suggest that ELIZA has passed a Turing-like Test. ELIZA has surprised the AI 
community because it gave the impression of deep semantic linguistic processing but it 
was in fact based on shallow language processing. ELIZA’s natural-language- 
processing techniques were mere pattern matching, using little or no grammatical 
knowledge (e.g., swapping pronouns such as ‘you’ and ’me’ in certain patterns)—i.e., 
the techniques are not cognitive. 


The ELIZA architecture consisted of a single read-search-replace-print loop, which read 
in a user sentence, ran a series of regular expression substitutions, and printed out the 
resulting response. The following is a typical conversation with ELIZA [35]. 


Patient : Men are all alike. 

ELIZA In what way? 

Patient : They're always bugging us about something or other. 
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ELIZA 

Patient 

ELIZA 

Patient 

ELIZA 

Patient 

ELIZA 


Can you think of a specific example? 

Well, my boyfriend made me come here. 

Your boyfriend made you come here 

He says I'm depressed much of the time. 

I am sorry to hear you are depressed 

It's true. I am unhappy 

Do you think coming here will help you not to be 
unhappy? 


Figure 2.4: Conversation between ELIZA and Patient 

At first sight, ELIZA and other systems of the same nature [5] appear extremely 
sophisticated for being able to carry out such human-like conversation by emulating a 
therapist. A closer look at how the program works shows just how simplistic it really is. 
ELIZA operates through sequences of pattern matching and string replacement. 


ELIZA uses a reassemble rule to construct the reply. The keyword lists, and the list of 
decomposition rules and reassembly rules are constructed in a script, which controls all 
the behaviour of ELIZA. An example of an ELIZA’s memory unit structure (an asterisk 
represents a wildcard that can contain any words or phrases) is shown in Figure 2.5: 


keyword: your 

decomposition rule: * your * 
reassemble rule: Why are you concerned about 
my (2) ? 

reassemble rule: What about your own (2) ? 

reassemble rule: Really, my (2) ? 

decomposition rule: ... 
reassemble rule: ... 


Example responses: 


Patient 

Eliza 

Patient 

Eliza 

Patient 

Eliza 


What is your name? 

What about your own name? 

Only your name, please! 

Really, my name, please? 

Just tell me your name! 

Why are you concerned about my name? 


Figure 2.5: Decomposition and Reassembly Rules in ELIZA [35] 


The ELIZA algorithm is a relatively simple program, and the pattern matching 
operation of the original ELIZA still has four major problems: [36] (1) lack of 
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anaphoric analysis, it cannot use previous question-answers to keep the continuity of the 
conversation content and to store information about the user’s personal data, (2) lack of 
ability to restrict the conversation to its topic, (3) lack of ability to understand the 
meaning beyond the sentence, and (4) limited communication with ELIZA by 
exchanging text prompts. 

Most contemporary researchers did not need much convincing that ELIZA was at best a 
gimmick, at worst a hoax, and in any case not a "serious" AI project. The irony of 
Joseph Weizenbaum admitted in Computer Power and Human Reason: From Judgment 
to Calculation [37] that, by failing to promote his own technology, he encouraged his 
own critics [38]. He also admitted that the computer will never be able to imitate the 
wisdom and emotion displayed by human. 

Following the ELIZA project, other spin-offs have developed different personalities 
such as PARRY and SHRDLU. These have mainly been due to people’s enjoyment and 
curiosity on the novelty of having a conversation with a computer. Some of these 
personalities are not much sophisticated than ELIZA, except that modern computers are 
faster and have more memory than the computers used by Weizenbaum in the 1970s. 
As a result, it could be considered that these CAs can hold a conversation more 
successful that ELIZA could. 

2.4.1.2 PARRY 

PARRY was written by psychiatrist Kenneth Mark Colby [33] in 1972 at Stanford 
University. PARRY was modelled on the paranoid mind, and many expert psychiatrists 
found it difficult to tell whether PARRY was human or not. PARRY was a step up from 
ELIZA because it tried to add more personality through beliefs and emotional 
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classification (simply accept, reject, neutral). Instead of just matching trigger words 
with responses, PARRY had a conversation strategy built in for better dialog exchange. 

Both ELIZA and PARRY are hard-wired programs that try to fool users into thinking 
they are talking to a human. PARRY’s personality as a paranoid schizophrenic was a 
complete opposite of ELIZA. The design of PARRY is split into two separate modules: 
a general language processing module and a module dealing with a model of the 
paranoid mind. The language module is used to interpret textual input into an internal 
representation of the situation. The interpretation of the situation is then used to update 
the model of the mind. Finally, the resulting state of the mind is used to choose an 
action in form of a textual response. Each of the modules uses several layers of 
condition-action rules to analyse its input. 

The drawback of PARRY is without doubt its inability to generate responses. With only 
a limited number of responses that are not allowed to be repeated, the ranges of topics 
that can be discussed are very limited. Even though PARRY’s language processing 
technique combines several layers of pattern matching, it does not gain more 
expressiveness than regular language. 

PARRY was tested in a limited TT [33]. The results of the tests showed that responses 
generated with either PARRY or chosen at random were indistinguishable from 
responses generated by a real paranoid patient. Thus, PARRY is not credited with 
actually having passed the TT. Although the machine may fool a human interviewer, it 
is a deception based on tricks and clever programming rather than authentic thought. 
For example, consider the following exchange between ELIZA and PARRY as shown in 
Figure 2.6. 
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People get on my nerves sometimes. 

I am not sure I understand you fully. 

You should pay more attention. 

Suppose you should pay more attention? 

You're entitled to your opinion. 

What makes you think I'm entitled to my opinion? 

It bothers me just to be around people. 

Please go on. 

[PARRY continues with some paranoid rumblings about horseracing, bookies, and the Mafia] 

PARRY I know the mob controls the big rackets. 

ELIZA : Who are you really talking about? 

PARRY : Have you ever really thought about the Mafia? 

ELIZA : Suppose you ever really thought about the Mafia? 

PARRY : Let's talk about something else. 

ELIZA : I am not sure I understand you fully. 

PARRY : You keep asking me the same things, why not change 

the topic? 

ELIZA : In what way? 

[Conversation continues ...] 


PARRY 

ELIZA 

PARRY 

ELIZA 

PARRY 

ELIZA 

PARRY 

ELIZA 


Figure 2.6: ELIZA converse with PARRY [39] 

The above conversation seems to make sense on the surface level, but it clearly would 
not convince someone using the TT in order to decide if they were talking to another 
person or a machine. In this sense, even machines cannot talk fluently with one another 
without some degree of disjointedness between them. 


2.4.2 Conversation Agents in the Loebner Prize 

In 1990 Dr. Hugh Loebner, a philanthropist, agreed with The Cambridge Centre for 
Behavioral Studies to institute the Loebner Prize to promote artificial intelligence and 
the Turing Test. Dr. Loebner pledged a Grand Prize of $100,000 and a Gold Medal for 
the first computer whose responses were indistinguishable from a human's. Each year 
an annual prize of $2,000 and a bronze medal is awarded to the most human-like 
computer. Every year since, the Loebner competition for CAs has been conducted in 
various academic institutions around the world. However, this has led developers to 
focus on ways to meet the ten minute challenge, rather than on how to build practical 
and useful conversation systems. The annual Loebner Prize competition encourages 
researchers to develop CAs (e.g. PC Therapist, TIPS, MegaHAL, CONVERSE, FRED, 
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ALICE, Ella, Jabberwok and Jabberwacky, etcs.) that can win the competition as shown 


in Table 2.3. 


Table 2.3: A list of the Loebner Prize Winners from 1991 - 2007 28 


Year 

Conversation Agent (CA) 

Developer 

1991, 1992, 1993, 1995 

PC Therapist, PC Professor, PC Politician 

Joseph Weintraub 

1994 

TIPS 

Thomas Whalen 

1996 

MegaHAL 

Jason Hutchens 

1997 

CONVERSE 

David Lxvy 

1998, 1999 

FRED 

Robby Garner 

2000, 2001, 2004 

ALICE 

Richard Wallace 

2002 

Ella 

Kevin Copple 

2003 

Jabberwok 

Juergen Piner 

2005, 2006 

Jabberwacky 

Rollo Carpenter 

2007 

Ultra Hal Assistant 

Robert Medeksza 


Modern CAs have evolved from their classic predecessors by utilising better natural 
language processing and by implementing learning algorithms that adapt to user input. 
While there are many winning CAs in the Loebner Prize, today only two are noted for 
their new approaches - ALICE and Jabberwacky as they have won in multiple years. 
These two systems are discussed in detail in the following sections. 


2.4.2.1 ALICE 

ALICE (Artificial Linguistic Internet Computer Entity), a top-ranked Internet CA, is an 
artificial intelligence natural language created by Dr. Richard Wallace in 1995 and 
continuously improved over the years. ALICE has won the annual Loebner Prize three 
times - in 2000, 2001, and 2004 [40]. The Loebner prize declares its aim to “advance AI 
and serve as a tool to measure the state-of-the-art ”, which makes it a good candidate 
tool for human-like natural language based dialogue systems. ALICE was rated the 
‘most human computer’ by the judges of Loebner prize contest.. ALICE’S behaviour is 
based on AIML (Artificial Intelligence Markup Language), an XML specification for 
programming CAs. It follows a minimalist philosophy based on simple stimulus- 


28 http://www.loebner.net/ 
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response algorithms, allowing programmers to specify how ALICE will respond to 
various input statements. 

ALICE tailors the conversation for categories of individuals, mainly through attempts to 
determine age, gender, geographic location, and occupation. Also, ALICE is based on 
Case-Based Reasoning or "nearest-neighbour" classification. For every input, the best 
matching "case" is found in the pattern set, and a reply is generated based on the 
associated template. ALICE is also based on the research of Professor George Kingsley 
Zipf. Zipf’s Law was applied to ALICE [38, 41, 42], which consist a particular set of 
user responses considered to be the most common, followed by a set which is less 
common, and followed by another set which is considered as almost never used. These 
responses will be used in the above preferred order. 

In many respects, ALICE is a throwback to the earlier ELIZA-type programs. At its 
core, it is a simple stimulus-response mechanism. It does no syntactical parsing nor 
does it represent in any way common-sense knowledge about the world. What ALICE 
offers, however, is a framework that can be implemented and modified to model human 
dialogue in limited domains. For example, ALICE allows for the structuring of its 
utterance pairs in terms of topics. It also allows for the rewriting of utterances into 
meaning equivalents via a stimulus reduction mechanism. 

Although ALICE is considered a state-of-the-art CA and three-time winner of the 
bronze Loebner Prize [23] and Chatterbox Challenge [43], it violates all the rules of 
regular conversation. The bronze Loebner Prize implied it was good, but not good 
enough to pass the TT. After a few turns, it becomes clear that the conversation is going 
nowhere. An example of a conversation with ELIZA is shown in Figure 2.7. 
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ELIZA: I am ELIZA. 

ALICE: I am very pleased to meet you ELIZA. 

ELIZA: How can I help you? 

ALICE: Just keep talking to me as long as you can. 

ELIZA: Do you wish to as long as me? 

ALICE: OK, I will as long as you. 

ELIZA: Do you wish to will as long as me? 

ALICE: OK, I will will as long as you. 

ELIZA: Do you wish to will will as long as me? 

ALICE: OK, I will will will as long as you. 

ELIZA: Do you wish to will will will as long as me? 

ALICE: OK, I will will will will as long as you. 

ELIZA: Do you wish to will will will will as long as me? 

ALICE: OK, I will will will will will as long as you. 

ELIZA: Do you wish to will will will will will as long as me? 

Figure 2.7: ALICE converse with ELIZA 


Another problem of the AIML is their stimulus-reponses algorithm. According to Chris 
McKinstry [44], the founder of the MindPixel, the first problem with a pure stimulus- 
response strategy is that there is no common personality across all stimulus-response 
pairs. The second problem is matching the hand-coded responses in the stimulus- 
response strategy. If an exact stimulus is not found in the index, the system will not be 
able to provide an appropriate answer and it will evade the stimulus in an ELIZA-like 
fashion. By asking simple questions on common aspects of life, such systems will be 
liable to be exposed as simulations rather than being truly intelligent or human. One of 
the solutions to these problems is to inject a very large number of collected 
commonsense propositions. An example is the MindPixel [45]. This data set, which 
eventually became known as "GAC-80K" (pronounced as “Jack”, an AI software that 
learned from 80,000 common sense statements and validations and generated yes/no 
answers when a new statement was input), was tried with some success in conjunction 
with the ALICE Silver Edition 29 , in the 2004 Loebner Prize[23] and the the 2004 
Chatterbox Challenge [43]. The actual MindPixel data inside ALICE Silver Edition is a 
subset of the GAC-80K which has been edited into AIML format. 

29 

http://www.alicebot.org/join.htnil 
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AIML appears to be based on a fallacious basic assumption: that a given input string 
should map to a single output string. In some cases that's fine, but most of the time, 
AIML will go wrong as shown in the following example 30 : 


ALICE: Let us change the subject. 

Human: no 

ALICE: Hmm. 

Human: no 

ALICE: Are you serious? 

Human: no! 

ALICE: Are you serious? 

Human: no!! 

ALICE: Are you serious? 

Human: no!!! 

ALICE: Are you serious? 

Figure 2.8 ALICE Chatting with Human 


So this AIML set seems to map "no" to "Are you serious?”. However, people use "no" 
in many different contexts, and "Are you serious?" is not always an appropriate 
response particularly as it invites the user to say "no" again, inducing a loop! However, 
this is a minor flaw that can easily be fixed by modifying the AIML code to redirect to a 
different response. It is however interesting to note that this has not been done on the 
online ALICE for many years. 


AIML has some problems that must be resolved. Standard AIML uses depth-first search 
[46, 47], which does not guarantee the most appropriate result. The algorithm finds the 
first available solution by searching through a tree of answers. The standard AIML 
definition of “best” does not attempt to minimize anything, but simply finds the first 
matching pattern, and does not test it and determine whether other patterns might fit 
better. While there are means to improve the AIML search process, but these 
modification still do not guarantee any “best” response. 


30 Conversation with online ALICE Silver edition on 27 March 2008 at 
http://www.pandorabots.com/pandora/taik?botid=f5d922d97e345aal 
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Currently, there are only three approaches to AIML knowledge creation. The first 
approach is anticipatory, which allows a botmaster to try and guess all or most of the 
likely ways the clients might ask the same question, or express the same statement. This 
is called a “Knowledge Wizard’' [48, 49]. 

The second type of AIML content creation is based on a backward-looking log file 
analysis. In its simplest form, the botmaster will read the recorded conversations and 
take note of “incorrect replies” in the dialogue, and then write new categories for those 
queries. The drawback of this handcrafted AIML knowledge creation approach is the 
amount of time required to acquire initial knowledge and to place them in a structured 
template. According to Richard S. Wallace, a good botmaster might be able to add one 
answer per minute. That’s why it has taken a decade to build the AIML knowledge. To 
quote Wallace’s statement, “It took about seven days of work, 24 hours a day, equating 
to around one month for a dedicated creative team to fill an ‘empty’ ALICE with 10,000 
answers .” [50]. 

The third approach is deployed by the Annotated ALICE AIML (AAA) [51]. AAA is a 
free and open-source software package based on XML specifications. It is a set of 
AIML scripts and this is the backbone of the award winning conversation system. AAA 
is specifically reorganised to facilitate conversational system developers to clone the 
‘knowledge’ of the conversation system and to enable the creation of customised 
conversation agent personalities. This approach has reduced the need to invest huge 
efforts in editing the original AAA content. AAA’s knowledge bases covered a wide 
range of subject domains based on the CA’s “personality”. 
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Recently, there has been renewed interest in the development of CAs, there are almost 
thousands of CAs available on the Internet, with numerous dedicated portals and 
websites 31 . AIML has captured more than 80% [52] of the world market for CA 
technology, because it is supported by GNU open source free download. Some of these 
CAs are deploying AIML technology to deliver customer services for major companies. 
A list of these applications include: IKEA [53], Amazon.com [54], Virtual Interactive 
Story Telling Agents (VISTAs) [55], Interactive Virtual Humans (IVH) [56] and 
FAQchat [57], Another application of CAs is simply to entertain on-line visitors on the 
mobile services such as Demy [58], Virtual Guide for Cultural Heritage Tours [59]. For 
edutainment purposes, applications can be found in Emile [60, 61] used by UK 
academic community, Foreign Language Learning (FLL) [62], Aslak [63], CatBot[64], 
TutorBot [65], Partner (SP) [66, 67], Persona-AIML[68], Kairai 3-D software robots 
[69], Discussion-bot [70], and Intelligent Tutoring Systems [71]. Finally, CA’s has also 
been used as recommender systems based on AIML-based mechanism [72]. 

Richard Wallace, the founder of the ALICE Foundation, states: “Business uses include 
online sales representatives and help desks and advertising ... Yet perhaps the biggest 
markets are the Entertainment markets”[38]. Thus, a CA fits nicely into the objective of 
creating an entertainment robot. CA developers are currently working on the John 
Lennon Artificial Intelligence Project [31], as well as creating CA “clones” in the form 
of Elvis Presley and U.S. President George W. Bush. There’s even an AI version of 
Jesus Christ, and a Buddhabot [73]. 

AIML technology is not only popular with companies and enthusiasts, but also 
prestigious research universities such as Harvard University, Carnegie Mellon 

31 http://www.simonlaven.com/ the most comprehensive CA websites on the Internet. 
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University (CMU) and Massachusetts Institute of Technology (MIT). For instance, 
Harvard Medical School’s Virtual Patient program, VPbot [74, 75], was designed to 
simulate patients that medical students can “interview” through a web-based interface. 
Not only have students who have used the Virtual Patient scored higher on exams [76], 
the Association of American Medical Colleges (AAMC) has chosen the Virtual Patient 
and VPBot to be at the core of its new nationwide MedEdPORTAL initiative[77], 
Moreover, at CMU, Nursebot 32 (called ‘Pearl’ [78, 79]) was created as an assistant for 
the elderly. Nursebot interpreted and responded to participants’ questions using a 
customized variant of AIML. At the MIT Media Lab, OpenMindBot [80] has been 
developed using the OpenMind 33 database of commonsense knowledge in the instant 
messaging virtual community via the AOL Instant Messaging Network. The messaging 
system allowed users to communicate with OpenMind, AIML, and the WWW while 
having a conversation in real time. 

Lurthermore, the experience of September 11 th 2001 has been taken seriously by AI 
researchers working on terrorism research projects at the University of Arizona 34 , and 
the event has further motivated the development of intelligent CAs. The aim of the CAs 
is to develop and evaluate scalable techniques for collecting and analysing terrorism 
information, modelling terrorist behaviour and terrorist networks, and disseminating 
information to the concerned groups such as victims and citizens. The AZ-ALICE [81, 
82] and TARA (Terrorism Activity Resource Application) [83, 84] CA experiments 
were an exploration into the potential of using natural language CAs as conversational 
entities. These two studies, AZ-ALICE in 2003, and TARA in 2004, were built on the 
existing AIML algorithm. In the AZ-ALICE experiment, a limited telecommunications 

32 http://www.es.cmu.edu/~nursebot/ 

33 http://openmind.media.mit.edu 
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knowledge set was used to test how well the CA could respond to telecommunications- 
related questions. The TARA studies went one step further and analysed a substantial 
terrorism knowledge base that was automatically gathered from the Internet as a vehicle 
to disseminate terrorism-related information to the public. 

2A.2.2 Jabberwacky 

Another contemporary and top-rated Internet CA system is ‘Jabberwacky’. It is the 
winner of the Loebner prize in 2005 and 2006. Jabberwacky was created by British 
programmer Rollo Carpenter and it uses a purely statistical method, circumventing the 
need to program syntactic and other linguistic rules. Jabberwacky logged all user inputs 
along with information about the preceding utterance and produces appropriate outputs 
based on similar context. In other words, all Jabberwacky’s utterances are from former 
user inputs. It leads to debates and arguments that whether the outputs are from the 
computer or from human. Jabberwacky seeks to provide an impression that the system 
understands the conversation by providing more relevant responses based on contextual 
pattern matching techniques. Jabberwacky incorporates AI and will keep track of the 
conversation that all users have submitted to it. It then tries to reuse those statements by 
matching them to the user’s input. Jabberwacky has no long-term memory, so it will 
respond only to the last sentence written. In future responses, Jabberwacky uses the 
learnt material from the recorded conversation logs. In some ways it models the 
approach that humans learn about language, facts, context and rules [85]. 

Nonetheless, these simple gambits can produce surprisingly intelligent-seeming 
conversations. That’s because they rely on a trick of human psychology making the 
systems to appear as more intelligent than it actually is. It has been noted that some 
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users have chatted with ALICE and Jabberwacky online for hours, apparently not 


knowing - or perhaps not caring - that they are conversing with machines. 


But could one CA fool another CA? What would one say to another in private? To find 
out the result of the exchange, a conversation between ALICE and Jabberwacky was 
arranged and recorded 35 . To get each snippet of chat rolling, a “seed” question is posted 
from one CA to the other. After that, they were left on their own. What follows is a part 
of the unaltered record of what has been uttered by the two machines as shown in 


Figure 2.9. 


(Given seed question: Do you know the meaning of life? ) 


ALICE 

Jabberwacky 

ALICE 

Jabberwacky 

ALICE 

Jabberwacky 

ALICE 

Jabberwacky 

ALICE 


The meaning of life is part of God's mysterious 
plans. 

Do you know somebody that could know the meaning of 
life ? 

I don't know if I know it. Where would I have heard 
about it? 

You always be here. These are the things that you 
have to know. 

Which ones? 

Languages. 

LANGUAGES is a good topic. Sure, LANGUAGES sounds 
like a good topic. 

Which language do you like? 

AIML . 


Figure 2.9: Example Conversation between ALICE and Jabberwacky 


When two strangers engage in a conversation for the first time, it is likely that they will 
try to initiate topics on common life experiences or common interests. Examples are the 
language they used, what they do, where they live or where are they from. A CA does 
not have life experience nor it has any “personal interest”. What they have is some 
canned answers that they try to match to questions posed by the users. While most of 
the CAs have some logging facilities to record the conversation, they do not have 

35 The conversation between ALICE and Jabberwacky take place at 

http://www.pandorabots.com/pandora/talk7botidM5d922d97e345aal and http://www.iabberwackv.com/ 
on 27 March 2008 
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learning capability nor do they utilise such information. In other words, it could be said 
that the CAs do not have “long term memory”. They “forget” what was said a moment 
ago and their responses completely disregard the current context. 

2,4.3 Commercial Conversation Agents System 

With the growth of electronic commerce on the WWW, CAs are becoming very useful 
in the commercial world. Commercial CAs are designed to fulfill specific functions. For 
example, a CA could be a part of a web-based customer relation management system. A 
CA could behave as a tireless employee - working 24/7, offering a friendly and useful 
first contact to customers, and are able to deal with the most common problems or 
requests for information as included in their knowledge base. The sole objective for 
these CAs is to be good at their particular job, rather than to be an expert in general 
conversation. However, nothing prevents the CAs from being able to talk about a wide 
range of topics depending on the amount of knowledge implemented by their 
botmasters. 

In this section, the use of CAs for commercial purposes is investigated. As economic 
interests are behind the commercial programs, results from such approaches are 
expected to be more in focused than in the experimental approaches of the previous 
CAs. Classical and Loebner Prize CAs are mostly experimental, and the documentation 
of these agents are widely available. Since no proper documentation is available on 
commercial CAs, our investigation on commercial CAs will be based on the task 
environment of ELIZA; on the Performance measure, Environment, Actuator, and 
Sensor model (PEAS) [86]. There are a number of commercial CAs available for e- 
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commerce and e-services, such as Anna , Spleak , Lucy , Cybelle , Verbot , among 

36 http://www.ikea.com 

37 http://www.SDleak.com 
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others. However, only two of them, namely Anna and Spleak, will be discussed in detail 
based on their contributions, the employment of the up-to-date technology and 
reputation amongst their customers. 


2.4.3.1 Anna 

Anna is a digital assistant developed by Artificial Solutions for IKEA using Kiwilogic 
Lingubot. Digital assistants such as Anna are programmes that appear on e-commerce 
Internet sites in a variety of roles; their use will continue to grow as more companies see 
their deployment as enhancing human-computer interaction while building brand 
awareness and increasing sales. ELBOT, Loebner’s 2003 bronze runner up and 
Chatterbox 2003 winner, is the underlying technology behind the text-based dialogical 
query system Anna. As a 24-hour virtual customer service agent, Anna, through text- 
based dialogue, assists customers to find items and prices from IKEA’s online catalogue 
[53]. Anna engages in 20,000 conversation exchanges per day across IKEA Internet 
pages in eight countries and six languages, including English 41 . Aside from the textual 
dialogues, the webpage with Anna also presents an animated image of Anna’s talking 
head without speech synthesis. The system deploys a URL push technique by sending a 
“pop-up” related to customer’s request. As Anna is a commercial product, little has 
been published about the techniques behind it. Therefore, we evaluated Anna based on 
its behaviour by seeding a conversation from their online website 42 as shown in Figure 
2 . 10 . 


John : Hi, my name is John 

Anna : Thank you very much, John. Nice to meet you. 

Anna smiles 

John : I'm looking for a table. 


38 http://www.speak2me.net 

39 http://www.agentland.com 

40 http://www.verbots.com 

41 Kiwilogic. Virtual Service Agents http://www.kiwilogic.com/ 
42 http://193.108.42.79/cgi-bin2/IKEA USA.cgi 


36 








Anna 


: On this page you will find Tables & desks. 

A page pops up with variety of tables. 

John : I need a computer table 

Anna : Here you will find the Computer workstations. 

A page pops up with computer workstations table. 


Figure 2.10: Example Conversation with Anna 


From the conversation above, Anna is able to keep and remember an internal state that 
reveals information such as names, and the most recent lines of input are remembered. 
The simplest selection method is the one used by simple reflex agents such as ELIZA. 
This selection mechanism uses regular expressions for identification of input and 
possibly for information extraction using ELIZA’s transformation rule, which can be 
expressed as: 

((I'm (/NAME)) 

(Thank you very much, NAME. Nice to meet you.) 

The conversation shown in Figure 2.10 also showed that Anna is able to store the 
current state describing the topic as the input “table”. This context recognition will be 
an answer to a question and an expected response mechanism as used in the AIML 
architecture by using “topic”. Such a feature requires a very limited world of IKEA 
furniture model. Even though Anna has a commercial purpose, the applied techniques 
demonstrate nothing more advanced than what ELIZA did. As claimed by Fred Roberts 
[87], Kiwilogic Editor does not build AI, but rather commercial dialogue systems with a 
well-defined area of expertise; usually company FAQs and Website navigation. Among 
these user inputs will be found commonly asked questions, which the Anna has not yet 
been prepared to answer. 
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2A.3.2 Spleak 


Spleak 43 is a virtual person created by Morten Lund at IMT Labs for instant message 
dialogues via Microsoft MSN Messenger. Spleak won second prize in Microsoft’s 
Robot Invasion Competition in 2006 44 . As a virtual person, Spleak has the ability to 
participate in textual dialogues and appears in the form of a 21 year old girl from New 
York. It has the ability to chat with people and has access to data such as encyclopedias, 
dictionaries, weather forecasts, news, and entertainment items such as horoscopes and 
textual games, black jack, and hangman. Spleak also features a set of tools such as a 
calculator, unit conversion, spell checking, and web searching. 


Similarly to Anna, Spleak is a commercial product (subsidiary of the Microsoft 
Corporation) and insight into the techniques used is thus not available. Therefore, we 
evaluated Spleak’s behaviour by chatting with her, after adding her account 
( spleak@hotmail.com ) into our MSN Messenger contact list as shown in Figure 2.11. 
Spleak uses an information extraction technique to gather information about the human 
participant and is able to communicate with limited use of emoticons. Information such 
as name, age, and location is stored in a simple internal model for later use. However, a 
simple repetitive test will show if a model of the discourse is kept. When we repeatedly 
asked the same question, Spleak actually noticed the loop: 

John : are you there? 

Spleak : I'm here! 

John : are you there? 

Spleak : Yup, ready as I'll ever be. 

John : are you there? 

Spleak : One more time, 0!!! I didn't get it the first three times... 

Figure 2.11: Example Conversation with Spleak 


43 http://www.spleak.com 

44 http://www.robotinvaders.com 


38 






If two questions are asked repetitively in an alternating sequence, Spleak will not notice. 
Thus Spleak does not keep a model of the whole discourse, but at most remembers the 
last three inputs. Spleak also uses an aggressive strategy to abandon questions which are 
not understood and extract information from the human participant by dominating the 
conversation by asking questions. The intention of the topic shift is to avoid talking 
more about the user’s interest. This mechanism is used in ELIZA by eliminating words 
from the input and AIML in their random <li> tag. This indicates that responses are 
chosen either on the basis of matched regular expression, or via a solely random 
mechanism. It is clear that a grammatical analysis of the input is not deployed. 

Here, two commercial CAs have been investigated: IKEA’s virtual assistant Anna and 
IMT Labs’ virtual person Spleak. Although no documentation is available for either of 
these CAs, their behaviour indicates that a reflex agent design with a very limited model 
is used. The model is only capable of remembering simple information such as the 
name of the human participant in the dialogue. Unfortunately the commercial 
approaches seem to be nothing but direct imitations of ELIZA or ALICE, which could 
be the result of a limited budget or tricks, as the following section outlines. 

2.4.4 Tricks or AI 

Lifty five years after Turing proposed the Imitation Game, Weizenbaum's ELIZA 
program demonstrated that “a simple computer program” could successfully play the 
Imitation Game by resorting to a few “tricks”, the most obvious being to answer 
questions with questions [32]. This also quite evident from the Loebner competition 
[23], the popularity of CAs based on AIML language [41], and the general lack of 
progress in text understanding and natural language dialogue systems. Cheating 
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obviously has its limitations, and it is doubtful that good natural language interfaces 


may be built this way. 


Some people interpret the TT as a setting in which you can "cheat". The game has no 
rules constraining the design of the machines. The TT in general and the Loebner prize 
in particular reward tricks [88], and the winning programs for the last fifteen years have 
clearly incorporated some tricks. This sort of qualitative assessment to program 
knowledge is exactly what the ELIZA is trying to avoid, replacing the question “Can 
machines think?” with a performance test. Turing’s Imitation Game is generally 
inadequate as a test of intelligence, as it relies solely on the ability to fool people. 
Perhaps the biggest obstacle to the advancing in this area is the fact that there are not 
many uses for fooling people besides the TT [89]. This can be achieved easily, as 
Weizenbaum has found [90]. Here we describe and compare some of the better tricks, 
confident in the belief that when a computer program eventually does pass the TT, it 
will use many of them. There is also a simple reason for this as people are already using 
them in everyday life as depicted in the Table 2.4 


Table 2.4: Conversation Agents’ Tricks 


Fostered by including substrings of the user's input in the program's output. 

ELIZA User: You hate me. 

ELIZA: Does it please you to believe that I hate you? 

Use of the Rogerian mode, which provides unimpeachable cover for the 
computer. 

Admitting ignorance. 

PARRY “I don't know ”, 

Changing the level of the conversation 
"Why do you ask that?” 

Introducing new topic, launching into a new story also called as simulates 
paranoid behaviour. 

Launching into a new story. 
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Having many fragments of directed conversation stored in activation network. 
OTHER - Using humorous statements to make the program seem more human. 

TRICKS - Agreeing with the user, in certain cases, the program can safely agree with the 

user rather than being non-committal. 

Simulated typing, by including realistic delays between characters that imitate 
the rhythm of a person typing. 

Make longer replies that seem more human-like than current reply. 

Introduce a new subject to divert the human user’s attention. 

Fragments of directed conversation stored in activation network, “opening 
book”. 

Controversial statements, eg “People don't own cats... ” 

Agreeing with the user, rather than being non-committal. 

Excerpting News, weather forecast, textual games, etc. 


2.5 New Challenges 

In this thesis, the challenge for the development of intelligent CAs will not only be 
based on the central role of natural language, but also on the capability of the CAs to 
gather knowledge, and to act and speak like a human which will be explained in the 
following sections. 

2.5.1 Natural Language Understanding (NLU) 

The NLU field took off during the 1970s, beginning with Terry Winograd’s SHRDLU 
[91] [92] system, a classical CA embedded in a world of toy blocks at the MIT 
Artificial Intelligence Laboratory in the late 1960s. SHRDLU focused on understanding 
natural language and AI, rather than responding in a human-like fashion. The domain 
was limited to a simulated world, which contained a table with a box and a number of 
coloured toy objects. The field of NLU is strongly connected to the field of AI. The aim 
is to transform the text into something that computers can “understand”. That means 
that the computer should be able to answer natural language questions about the text, 
and also be able to reason about facts in different texts. 

NLU paradigms use predicate logic as a semantic representation, such as that used by 
the question answering systems LUNAR[93], AskJeeves[94], AnswerBus [95] and 
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START[96], which is very important when it comes to choosing the best approach to 
building useful systems. These challenges highlight the strengths and weaknesses of the 
different systems for building further NLU systems. New challenges are being 
considered, including metrics for conversation naturalness and “meaningful” interaction 
with the users. 

2.5.2 World Knowledge 

NLU is probably not something that can be achieved merely on the basis of linguistic 
knowledge, such as knowledge of a grammar and lexicon or ontology. It probably 
requires much world knowledge - not only semantic and pragmatic knowledge, but also 
something like what AI researcher Lenet [97] calls “commonsense knowledge”; i.e. lots 
of facts about the world, and perhaps also some specialized domain knowledge (e.g. 
knowledge of how to play chess). The computational linguist Terry Winograd [92], who 
pioneered the use of NLU for CA development, argued for the claim that NLU requires 
such knowledge. 

Three popular common sense knowledge bases have been created this century, namely 
Cyc 45 [98], OpenMind 46 [99] and Mindpixel 47 [44]. The OpenMind Common Sense 
project differed from Cyc because it focused on representing the common sense 
knowledge it collected as English sentences, rather than using a formal logical structure. 
This means that the information didn't use a standardized vocabulary with strict 
definitions for each component of the common sense knowledge. In addition, the Cyc 
model and OpenMind had a drawback which prevented truly large-scale collaboration 
[100]. Mindpixel, another AI commonsense project, was created using a web-based 
collaborative[100] AI project. MindPixel creates its knowledge bases by accepting input 

45 httP://www. cvc.com 

http://www.QDenmind.org 

47 http://www.mindpixel.com 
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from the general public, rather than knowledge experts. Some MindPixel data (GAC- 
80K) is being utilised by Cornell University [100] and The University of Memphis [101] 
to study theories in high-level reasoning. Some examples are the development of the 
Jackie [44] and ALICE Silver Edition [102] to simulate human conversation. Therefore 
a large-scale public collaboration collecting commonsense knowledge will play an 
important role in the development of the CA’s knowledge bases. 

2.5.3 Human-machine Interface 

In the search for the next generation user interface, there has been a growing interest in 
the development of an interface for CAs not only on the web, but also on instant 
messaging and mobile services. During the past decade, one can observe that there are 
rapid advances in CAs, spoken language technology and multimodal interfaces [103], 
[104], [105], [106] and [107]. All these have stimulated interest in a new class of 
conversational interfaces. As observed in the literature, researchers in this discipline 
have proposed different techniques and produced several natural language conversation 
systems. 

Moreover, CAs represents the convergence of new interface agents, spoken language 
and human-computer dialogue systems. As CAs become more realistic, users expect to 
be able to interact with the agents in natural language. 

2.6 Summary 

This chapter has given an overview of CA developments and human-machine 
conversation has been described. However, the Turing Test remains an issue in many 
discussions on AI. The Loebner Competition has been held every year since 1991. 
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Based on the transcripts 48 of these events, it is apparent that in terms of the techniques 
not much has changed, improved, or progressed [18, 108, 109]. 

One additional problem with TT, besides its reliance on a limited definition of 
intelligence, is related to natural language processing. The problem is exponentially 
more complex when one tries to develop a computer program to converse with a human. 
Even award-winning CAs including ALICE, which has been named the most 
indistinguishable from a human being in the Loebner Prize competition, have quirks 
that reveal them as not being human within the first five or six sentences of the 
conversation [38]. Jabberwacky comes a little closer to human language since it learns 
how to converse from previous user interactions. Anna and Spleak, two commercial 
CAs serve well as business identities. Based on the dialogs from these CAs, it appears 
that their responses are not much advanced or human-like than those from ELIZA and 
ALICE. Certainly, understanding unbounded natural language is one of the challenges 
within this study but it is not the sole research focus. The key contribution of this thesis 
has been the development of the CA framework offering interoperability, scalability and 
modularity, and the evaluation of such framework. However, there are several fine¬ 
grained conversation knowledge bases such as AAA and Mindpixel which make the 
processing of natural language tractable. These knowledge bases have been 
incorporated in the proposed framework. These facilities the development of the 
framework and focus on the practical aspects such as machine understanding, natural 
man-machine interface and equipping the CA with world knowledge as detailed in 
Chapter 3. 

One of the features in proposed man-machine interface is the CA’s ability to 
communicate via the Web, mobile services and instant messaging (IM) systems. These 
48 http://www.loebner.net/Prizef/loebner-prize.html 
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aim to improve the human-machine interfaces with lifelike avatars and emotional 
expressions as an embellishment of the system. As shown in Table 2.2, none of the CAs 
previously developed are embedded with lifelike avatars or speech synthesis, except for 
the ALICE, Jabberwacky and Anna CAs. These CAs have deployed an animated 
‘talking head’, but without speech ability. While some users may be content with a ‘text 
only’ interface, other users may be more comfortable with an animated avatar that 
provides visuals and sound. In the proposed system in this study, plug-in modules can 
be used to enhance the interface of a CA based on a flexible and modular approach as 
described in the next chapter. 
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CHAPTER 3 


CONVERSATION AGENTS FRAMEWORK DESIGN 


3.1 Introduction 

In this chapter, the challenges of developing reusable, extensible, scalable and modular 
software for CA frameworks are summarised. Solutions for these challenges are 
illustrated and discussed in the development of the CA called Artificial Intelligent 
Natural-language Identity (AINI). Most of the other CA designs are based on a two- 
tiered or three-tiered approach. The main contribution of this chapter is the proposed 
AINI framework with modular design which is based on a modified N-tiered 
architecture similar to the current Service-Oriented Architecture (SOA). Two additional 
tiers are proposed in the AINI framework: the channel sendee tier and the domain 
sendee tier. They can be implemented in a number of ways to provide the flexibility 
and scalability for different application domains and tasks. This has been demonstrated 
by two specific application domains on SARS [110, 111] and Bird flu [112, 113]. This 
chapter describes and exemplifies the underlying technologies being used in this thesis. 

3.2 Conversation Agents Framework 

Developing reusable CA software is particularly difficult because there is no universally 
agreed definition of what a CA is. In addition to the proprietary commercial CAs, most 
of those developed are built on an ad-hoc basis with an aim just to pass the Turing Test 
(TT) or for entry in the Loebner Prize. Over the past fifty years since the birth of AI, 
CAs has taken many forms. Some of the CA researchers were aspired to the 
development of CA embodiment known as embodied conversation agents (ECAs) [103], 
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[104], [105], [106]. Other CAs were designed for particular domains such as 
edutainment, medical, customer relationship management (CRM) [67, 69, 114-116], or 
crisis communication applications [117] [82]. On the human-computer interface (HCI) 
aspect, these CAs took the format of Computer Mediated Communication (CMC) using 
instant messages, e-mails and chat rooms [118-123]. The communication channel and 
technologies could be both wired and mobile services including SMS, MMS, WAP, or 
GPRS [124-126]. In terms of expression, these CAs may use verbal (speech synthesis, 
voice recognition), non-verbal (i.e., text-based, emoticons) [127, 128], and even 
reconfigurable animated characters [10, 59, 129, 130] with different faces, hands, heads, 
clothes, genders or other accessories. 

It is obvious that the design of modern CAs requires a better structure for their 
implementation. The goal is to develop a practical framework to improve the 
interoperability of the CA architecture through modular design. This is illustrated in the 
AINI framework [112, 113, 117]. Finally, the proposal is to build the CA based on 
established and stable multi agent framework. This reduces the needs to rewrite the 
complete application. These attributes can be summarised as follows: 

3.2.1 Reusability 

Framework reusability implies the reuse of the domain knowledge, interface design and 
query engine in order to avoid recreating and revalidating common solutions to 
recurring application requirements and software design challenges. Developing reusable 
frameworks could be a tough job and involves much effort, but it is generally believed 
that the work pays off at the end. Reuse of framework design components can yield 
substantial improvements in productivity, as well as enhancing the quality, performance, 
reliability, and interoperability of software. 
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3.2.2 Modularity 

In the AINI CA design, this framework enhances modularity by encapsulating 
implementation details and helps improve software quality by localising the impact of 
design and implementation changes. Creating modular architectures will allow plug- 
and-play or plug-in modules to reconfigure software variations to meet a specific design. 

3.2.3 Extensibility 

Extensibility in AINI framework design is essential to ensure timely customisation of 
new application services and features. A framework enhances extensibility by providing 
the means to allow applications to extend its interfaces and functionalities. These 
extension points or service modules systematically decouple the interfaces and 
behaviour of the application domain. For instance, AINI uses XML specification 
because the standard is already gaining widespread support. This has been generally 
accepted by the community as the preferred choice of knowledge and messaging 
representation language. 

3.2.4 Scalability 

In the AINI design, the framework must provide scalability features where the design 
can be modified without necessarily affecting other modules or the rest of the 
architecture. This relates to: 

• Conversation module 

New methods should be able to add to the conversation module by generating new 
services to the behaviour modules in the CAs. This module supports web-based, 
mobile-based or other network-based communication protocols. 

• Number of conversations and participants 

The system should be able to accommodate an increasing number of participants 
and to keep track of multiple conversations without modification of the architecture. 
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In addition, the conversation logs should keep the records of the conversations of 
the participants. This information could be kept for future analysis or compliance 
requirement. 

3.3 Conversation Agents’ Features 

There are many challenges in the development of AINI, these include the design, issues 
related to the flexibility and extensibility of the system, maintaining simplicity without 
being simplistic, extendibility, and long-term maintainability. The real challenge is how 
to provide a framework in which a developer can work on the CA system without 
having to understand the all modules within system. This challenge is also to find the 
delicate balance among the above elements with regard to the following aspects. 

3.3.1 Modules Integration 

The most challenging element in developing conversation software is the integration of 
all modules by establishing a plug-and-play architecture. To keep the complexity of 
systems manageable, and to simplify the testing and maintenance of the various 
modules, it is important to reduce code duplication as much as possible across the 
domains. 

3.3.2 Domain Independent 

Another challenge is the inherently complex multi-domain nature of the AINI system. 
Developing a generic framework requires continuous refactoring of common elements 
across multiple domains. In addition, building world knowledge and domain-specific 
knowledge for each application and task are labour intensive; therefore automated tools 
are necessary to manage this process. 

3.3.3 Cross-Platform 

A challenge in developing CA software is to decide the appropriate computer software 
platform. This includes operating systems, browsers or programming languages design 
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[131] - CAs can be implemented on multiple computer platforms such as Windows, 
Linux or Mac OS. The computer software: LINUX, Apache, MySQL and Perl, known 
as “LAMP”, is the solution stack. These are free open source programs, representing a 
non-proprietary, flexible way to create a client-server based application. The 
combination of these technologies is used primarily to define a web server infrastructure. 

Common computer languages used to design classical CA are mainly based on Lisp, 
Prolog, Icon, Bash, C, Haskell and Curl, which do not rely on client/web-server 
architecture. Although the Java Program dB [132] won second place in the Loebner 
Prize, it has the disadvantage that it hard-codes a local file path in the AIML file. 
Program dB is based on two-tiered architecture: the targeting server and the Loebner 
contest text-based interface. Program dB has no "long term memory" and it forgets all 
the clients each time it is shut down. This is partly the result of the lack of database, but 
it seems the approach improves the overall performance of the server. In this study, the 
CA development was based on cross-platform programming and LAMP solution, which 
will work on multiple platforms. Our principle was that the language must be simple, 
small in size, speedy and powerful [133]. Perl definitely made imperative programming 
concepts and object concepts much easier to grasp and use. This is one of great features 
of Perl. 
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3.4 iV-tiered Architecture 


In recent years there has been a dramatic growth in distributed computing environments. 
Unfortunately, the size and variety of heterogeneous computer systems make distributed 
computing difficult and complex [134]. To make systems interoperable, middleware is 
usually required. Middleware is software that rewraps the architectural details of a 
system and offers services to other systems on heterogeneous platforms across a 
network [135]. 

In the 1980s, the arrival of inexpensive network-connected personal computers 
produced the popular two-tiered client-server architecture[136]. In two-tiered 
architecture, every time the business rules are modified, the client application has to be 
changed, tested and redistributed, even when the user interface remains intact. 
Therefore, the two-tiered architecture suffers from scalability and modularity issues. 

The third tier (application server tier) is between the user interface (client tier) and the 
data management (data server tier) components. This middle tier provides process 
management where business logic and rules are executed and to accommodate hundreds 
of users by providing functions such as queuing, application execution, and database 
staging. The three-tiered architecture is used when an effective distributed client/server 
design is needed to provide an improved performance, flexibility, maintainability, 
reusability, and scalability [137-139]. A comparison between two-tiered and three¬ 
tiered client-server architectures is shown in Figure 3.1. 
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Figure 3.1: Comparison Two-tiered vs Three-tiered Client-Server Architecture 


The three-tiered system architecture is more commonly referred to as A-tiered 
architecture in reference to the unlimited number, N intermediary layers between the 
client and server, popularised by Bonasso[140]. A-tiered applications are highly 
scalable and offer the best performance. They can support any combination of user 
interfaces: web browser, mobile interface, network interface and others while offering 
the highest potential for code reuse and sharing. 


3.5. AINI’s Conversation Agent Architecture 

A-tiered architecture has been deployed by numerous domain applications and agencies, 
such as Human Capital Management system (HCM)[141], DARWIN [142] designed by 
NASA, SLAM 3DSim mobile robot[143], Urban Search and Rescue[144], Carnegie 
Mellon Navigation (CARMEN) Toolkit [145], CoCo an autonomous agent[146], 
spoken dialogue [147], mobile service software agent Open APIs(Parlay/ JAIN) [148], 
storytelling robot [149], Learn Sesame [150] and Intelligent ChatBot [151] to name a 
few. 
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In the design of CAs such as TARA(Terrorism Activity Resource Application) [83, 84], 
CAs were experimented with using a two-tiered architecture based on the original 
ALICE’S Loebner Prize winning Program D [152] on Java. Although Program D can 
be configured with unlimited number of CAs, their handwritten AIML knowledge bases 
are hardcode in XML and precompiled with Java, instead of using a flexible relational 
database [153]. This was followed by CMU Nursebot called Pearl [78, 79] and 
OpenMindBot [80], CatBot [64], TutorBot [65], Partner (SP) [66, 67], ChatBot [151] 
and Persona-AIML [68]. However, VPbot [74, 75], the Harvard Medical School’s 
Virtual Patient program, goes beyond this limitation by deploying Program E [154]. 
Program E is based on PHP programming language, which supports multi-tier 
applications. VPBot makes use of a relational database by loading the AIML files into a 
MySQL database. 

While many techniques and programming languages have been proposed over the years 
to develop CAs, the primary challenge remains in how to overcome the poor scalability 
and the lack of flexibility to handle the heterogeneous CAs software designs. Therefore, 
the proposed approach is to employ a novel modified V-tiered architecture similar to the 
sennce-oriented application architecture to capture information at decreasing levels of 
granularity at the agent brain (application server tier), agent body (client tier) and agent 
knowledge (data server tier). This architecture enables us to handle dynamic features 
unique to scripting languages, such as dynamic typing and code inclusion, which have 
not been adequately addressed by previous techniques. 

3.5.1 AINI’s Modified V-tiered Architecture 

The key to success in future conversation systems is to develop general dialogue 
systems [155]. This can be defined as a framework that is not designed for a particular 
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application, but can be ‘plugged’ to various applications. It should require minimal 
effort to develop the domain knowledge and to adapt to different applications. These 
considerations have led to the proposal of two main tiers for the handling of the 
generality issue in the CA framework. They are the Channel Service Tier and Domain 
Service Tier as shown in Figure 3.2. 


Client Tier 
(Agent’s Body) 


Channel Service Tier 
(Middleware) 


Application Server Tier 
(Agent’s Brain) 


Rule 0 



Rule 1 


Rule 2 


Rule 3 


Rule 4 


Rule 5 


Domain Service Tier 
(Middleware) 


Data Server Tier 
(Agent’s Knowledge) 




Figure 3.2: A INI’s Modified A-tiered Architecture 


These additional tiers contain all the necessary logic to receive and delegate the requests. 
In the previous three-tiered architectural concept, it was illustrated that the architecture 
is lacking in two major aspects. First, the three-tiered architecture combines 
communication logic with business logic in the application server tier to handle all the 
requests by using a single interface to access services. Second, in the three-tiered 
architecture, clients and developers are faced with potential issues with the 
communication channels. In order to avoid these consequences, the Channel Service 
Tier has been used, as it allows the creation of a flexible and extensible architecture able 
to support different interfaces and additional services more easily. The Domain Service 
Tier will support a flexible and extensible architecture that is capable of providing any 
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domain application based on a client’s request, without the necessity of changing 
domain services. 

3.5.1.1 Channel Service Tier 

Building extensible and scalable systems is a challenging task. Currently, different 
clients communicating over different channels or protocols such as HTTP or TCP, must 
be offered the same set of application services by the system. In addition, the current 
deficiency of CAs architecture is the lack of universally agreed standards. In general, 
the number of channels and the number of demanded services are expected to increase 
over time, and a gradual shift from web-based to mobile-based applications. The 
proposed AINF s modified A-tiered architecture could be a viable solution to meet these 
future challenges. Each channel has its own communication protocol. As the application 
tier has to satisfy requests from clients such as web browsers, mobile browsers, MSN 
Messenger interface applications, it must deal with service requests coming from 
different channels and each channel is characterised by its own communication protocol. 
Therefore, a server residing on the application tier must be able to support these 
communication protocols. 

3.5.1.2 Domain Service Tier 

Under the standard A-tiered architecture, the application tier and the data access tier are 
flexible and scalable toward domain knowledge independency. The suggestion made 
here is to create an additional layer called the “domain service tier” to resolve the 
coupling issues. The role of the domain service tier will be focused on handling 
concurrent access and managing changes to the domain knowledge tier. This domain 
service tier prepares a number of its own services that override methods or features of 
the database using SQL query. It is also the responsibility of the domain service tier to 
choose either the domain-specific or open-domain knowledge (discussed in section 
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3.5.3). This solution increases the cohesion of each tier and prevents the overall system 
from becoming excessively complex. In addition, it will assist the development of the 
domain application. For instance, to deploy a furniture domain application such as the 
Anna CA used by IKEA (see section 2.4.3 Commercial CAs), only domain-specific 
need to be updated, instead of the entire data server tier. 


The AINI conversation architecture has been reported in previous publications [117, 
156]. As illustrated in Figure 3.3, AINI employs a modified A-tiered architecture that 
can be configured to work with any web, mobile or other network applications. It 
comprises a client tier (hereafter called “agent body”), an application server tier 
(hereafter called “agent brain”) and a data server tier (hereafter called “agent 
knowledge”). 



CLIENT TIER 


DATA SERVER TIER 


APPLICATION SERVER TIER 


Figure 3.3: AINI’s Conversation Agent Architecture 
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As mentioned previously, the channel service tier and the domain sendee tier, are 
designed to support different channels of interfaces and can quickly be augmented with 
domain knowledge for specific purposes. The architecture provides features of 
multimodal interface, multilevel natural language query and multiple knowledge bases. 
The process of communication and answering is as follows. Given a question, AINI’s 
agent body will perform a HTTP over TCP request from a Web, mobile service or 
instant messaging service to the agent brain. Here, the agent brain will attempt to 
formulate a reply to the sentence via the Natural Language Understanding and 
Reasoning Module. This module will do the sentence parsing to produce grammatical 
categories and grammatical relationships by extracting pertinent information through 
the agent’s knowledge, such as Noun Phrase and Verb Phrase. If this step is 
successfully parsed, the network-to-path reduction will be carried out by Network-based 
advanced reasoning. From the query network, the question is decomposed into 
sequences of words or phrases and they form the nodes of a tree. After obtaining the 
query network, the task of answering the question is reduced to discovering the 
presence of the query network in the whole semantic network. Then, the problem of 
discovering the answer has been reduced. The appropriateness of answer is achieved 
through the logic and network approach during answer discovery using a template- 
based approach. The template method matches the answer against question templates 
requested by the user. The system will then produce an answer in the agent body. Each 
of the agent tasks is described in further detail in the following section. 

3.5.2 Agent Body (Client Tier) 

The user interface, or human-computer interface (HCI), resides in the agent body and it 
supports three different types of channels of communication, such as Webchat, 
MobileChat and MSNChat, controlled by the channel sendee tier. AINI uses HTTP 
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over TCP to connect to the Internet and mobile services to communicate with the users. 


The user interface was written using HTML, Javascript, vbscript, XML, WML, Flash 
Action Script and other client side scripting languages. For the MSNChat, AINI 
connected to the MSN Messenger client through the MSN Mobile Gateway as shown in 
Figure 3.3. 

3.5.2.1 WebChat 

WebChat is web-based interface that allows users to interact in real-time with AINI 
through cross-browser including Internet Explorer, Mozilla Firefox, Safari, Opera, and 
Netscape. For a few years, the CAs [104, 157, 158] emphasised the ‘embodiment’ 
feature of the interactive virtual characters or web robots [159] significant progress was 
made in terms of software architectures [160] [161]. To foster a relationship with the 
user and to encourage the user to interact with the system in a natural manner, it is 
possible to incorporate an anthropomorphic agent or avatar in the AINI’s framework. 
Nass and Reeves [162] from Stanford University suggested strongly that there is a 
tendency to treat computers as a human entity, which in turn results in human based 
social responses. Cassell et al. [158] pointed out that conversational computer agents 
have played very important roles in human-computer interactions. Such skills include 
the abilities to use face, hands and tone of voice to regulate the process of conversation, 
as well as the ability to use verbal and non-verbal means. To achieve this end, Microsoft 
Agent Technology [163], Flash Technology [164] and Oddcast Technology 49 [165] are 
used. These technologies introduce the use of an animated and speech enabled avatar, 
which will serve as the medium between the system and the user. Besides, users also 
can go through all the information on the website for the topics they are interested in. At 

49 Crisis communication research on SARS was supported by Oddcast Inc. in 2003 and Bird Flu in 2005. 
In these projects, Oddcast provide a streaming voice through text-to-speech (TTS) solution to make our 
ECAs more presentable. Oddcast interface that allows us to create and embed customize animated 
characters within AINI's Framework. 
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the same time, they can place questions to AINI for more information or guidance. 
Another advantage of the WebChat is a collaborative browser which allows a portal to 
guide the users through the website of the organization by automatically “pushing” 
URLs and information from other websites to the user’s browser. This not only 
facilitates communication between CAs and users, but also allows the intelligent CA to 
help users locate specific information on their websites. In addition, users are also able 
to personalise their WebChat interface by customising the avatars, in such ways as 
choosing their avatar’s gender, voice, face, clothing or accessories as shown in Figure 
3.4. As described in section 3.2.4, AINI’s architecture is scalable and can be embedded 
into any existing website such as a banking portal or education portal, and can even be 
integrated into a chatroom without changing its code, as shown in the Figure 3.5. 



CCNet Crisis Communication website 
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IW3D - Intelligent Web 3D website 



Choose your character, hair style, clothes, 
classes, etc. with Flash Technology 


Choose your favourite 3D avatars with Microsoft 
Agent Technology 


Figure 3.4: Personalise User interface 
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An illustration on how AINI could be embedded in a 
Malaysian Banking Portal 



An illustration on how AINI could be embedded in 
the Murdoch University website with an Avatar 



An illustration of AINI being integrated into the web-based Chatroom interface 

Figure 3.5: Examples to illustrate AINI’s Scalable Interface 


3.5.2.2 MobileChat 

Conversation chat through messaging applications were the first and most successful 
community applications for mobile services such as SMS, WAP, GPRS and 3G 
extended by Web services. In AINI’s framework, users can freely select the CA they 
prefer to access. Hence, they can chat anywhere, at any time, with any device. A 
mobile chatting module is implemented in a series of logical phases. Based on the 
Research and Markets report [124], the text based agent-to-mobile chats with agent-to- 
Internet and Internet-to-mobile chats are likely to be popular and implemented in the 
future. Today, around one trillion text messages are sent each year, and this number is 
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growing. Here, AINI’s modified A-tiered architecture is ready to support this 
technology as described in section 3.5.1.1. AINI’s MobileChat module consists of 
SMSChat, WAPChat and PDAChat sub-modules as discussed below: 


• SMSChat and WAPChat 


The mobile chat is an alternative method by which users can chat with AINI using SMS, 
MMS, GPRS or 3G services. The SMSChat services are the text-based chatting system. 
MMS is a service similar to SMS but with added image, voice, animation and other 
features. Meanwhile, the WAP technology provides mobile web browsing functionality 
for accessing news and other forms of data services by connecting the WAP gateway 
with a given URL. WAPChat and SMSChat provide text-based interactive information 
services and applications from the screens of their mobile phones as shown in Figure 
3.6 and Figure 3.7 respectively. 




Messages 


1 

1 CONNECT AHMAD 



Select 




Go to Menu and Select Type your message. 

"Write Message" Example: CONNECT 

YOURNAME 



Enter AINI’s SMSChat 
number: 


Example: 0166415196 


Step-by-step SMSChat with AINI 
Figure 3.6: SMSChat Interface 
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• PDAChat 


The idea of developing AINI into Personal Digital Assistance (PDA) is an interesting 
approach to having a more human and personalised interface between a computer and 
human [166-168]. The PDAChat with AINI performs functions similar to web chats, 
but in a mobile environment. It is a prototype designed to blend mobile technology with 
natural language to help humans interact more naturally with mobile devices. An 
example of PDAChat is shown in Figure 3.8. The PDAChat was designed using WiFi 
technology and powered by Microsoft Windows Mobile Technology [169] embedded 
with Pocket Internet Explorer on a HP iPAQ Pocket PC. The computer-generated avatar 
of the CAs is displayed on the PDA using Adobe Flash for Pocket PC 50 , and the system 
uses a WiFi wireless connection to AINI’s server. 

3.5.2.3 MSNChat 

The flexibility and scalability of AINI’s Channel Service tier not only allows web or 
mobile services, but also CMC application such as MSN messenger. MSN Messenger 
is a freeware instant messaging (IM) client that was developed and distributed by 
Microsoft 51 . The "language" used in IM communication between two computer 
programs is called a "protocol". The rules for messages sent between MSN Messenger 
clients’ servers are called the "MSN Messenger protocol". The architecture of MSN 
Messenger is very comp heated compared to other IM services such as AIM and Yahoo!, 
since it relies on five different types of servers to handle the communication and 
operation of its service 52 . MSN Messenger uses the Mobile Status Notification Protocol 
(MSNP) for communication. As shown in Figure 3.3, AINI’s agent body uses MSN 

5U http://www. adobe.com/software/flashplaver/pocketpc 

51 http://get.live.com/messenger/ 

52 http://hypothetic.org/docs/msn/ 


62 






protocol to communicate with MSN Messenger servers. AINI utilises the .NET Passport 
to sign into the MSN Messenger service. MSN Messenger sign-in session is based on a 
challenge-response mechanism to authenticate user credentials. The communication 
with the Passport server is conducted over the HTTPS (Hypertext Transfer Protocol 
over Secure Sockets Layer) protocol, ensuring that the sign-in information is encrypted. 
The client sends the challenge string, Passport username, and password to the Passport 
URL. If the credentials for signing in are confirmed, the Passport server issues a ticket, 
which is passed back to the notification server to complete the authentication procedure. 
Ligure 3.9 details the entire authentication procedure for AINI and MSN Messenger. 



Figure 3.9: AINI and MSN Authentication Process 


In the MSNChat module, we have outlined the conceptual and practical basis for the 
development of the AINI for MSNDesktopChat, MSNWebChat and MSNMobileChat 
sub-modules. All these modules are supported by the MSN Messenger protocol. 
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MSNDesktopChat 


MSN Messenger for Desktop, or MSNDesktopChat is the most popular free instant 
messaging client software [170] which captured 61 percent of the worldwide IM market 
share. MSN Messenger was developed and distributed by Microsoft Windows since 
1999, and was renamed Windows Live Messenger in 2006. The major use of the 
software is IM, although other features which now come as standard include support for 
voice conversations, webcams, transferring files, and built-in two-player online games. 
Similar to many MSN Messenger's competitors, MSNDesktopChat allows messages to 
be enhanced with graphical emoticons (sometimes called ‘smileys’), Flash animations 
called ‘winks’, animated display pictures, styled text, and many more accoutrements; 
with third-party add-ons as shown in Figure 3.10. 




Figure 3.10: MSNDesktopChat Interface Figure 3.11: MSNWebChat Interface 


• MSNWebChat 

The MSNWebChat module allows the users to interact in real time with the AINI 
software agent via a browser through MSN Web Messenger. It is possible for virtually 
any computer with an Internet connection to connect to the Messenger Service by using 
MSN Web Messenger. The MSNWebChat interface however does not have an interface 
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that mimics the MSN Messenger application as the MSNDesktopChat. Currently, these 
web-based chat sessions only support plain text, as shown in Figure 3.11. However, 
this MSNWebChat is a collaborative browser which allows a portal to guide the users 
through the website of the organisation. This is done by automatically “pushing” URLs 
and information from other websites to the user’s browser. This not only facilitates 
communication between the CA and users, but also allows the intelligent CA to help 
users locate specific information on their websites. 

• MSNMobileChat 

MSNMobileChat offers the same features as MSN Messenger for MSNDesktopChat. 
For MSNMobileChat, MSN Mobile device users can page AINI’s users with the pag 
command and receive incoming messages with the ipg command. Mobile chat is an 
alternative way in which users can chat with AINI using GPRS, WiFi and 3G services. 
At the same time, wireless telephony networks using 3G offer smartphone users a sub¬ 
set of TCP/IP networking services such as IM. This service offers far broader coverage 
than WiFi technology and is powered by Microsoft Windows Mobile Technology [169]. 
In this interface, AINI provides text-based interactive information services and 
applications from the screens of their PDA or smartphone as shown in Figure 3.12. 



Smartphone PDA 

Figure 3.12: MSNMobileChat Interface 
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3.5.2.4 Proxy Conversation Example 1 


The agent body provides the necessary interaction between the user and the AINI 
control by using the communication channel service tier. In this proxy conversation 53 
example, users interact with AINI through the normal Internet ports, which are 
connected to an agent body that provides communication between AINI and users with 
a Web Collaborative Module including WebGuide, WebTips and WebSearch as shown 
in Figure 3.13. This communication channel module has been developed in previous 
research [171]. The purpose of WebGuide is to guide users through the entire portal. It 
enables AINI to offer help without waiting for the user to ask. The WebTips engine will 
provide tips or hints to users. It is an intuitive feature that will recommend links within 
the site, whereas the purpose of WebSearch is to assist the users through the Google 
search engine. This web collaborative module functionality also allows AINI to “guide” 
the users through the organization’s Web site by automatically “pushing” URLs to the 
user’s browser (this function is also known as the “URL Push Technique”). This not 
only facilitates communication between AINI and the user, but also allows AINI to help 
users locate specific information on their Web site. In this way, the web collaborative 
module enables AINI to offer help without waiting for the user to ask. 
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Figure 3.13: AINI’s Communication Channel Module 


This is a localhost proxy conversation which can be used with almost any client in order to participate in Off-the- 
Record conversations. 
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However, if certain predetermined conditions are not met and AINI is unable to retrieve 
the information required by users, then AINI will, through the other communication 
module, automatically forward an email form that can be filled and forwarded to a call 
centre. But if users require instant information, AINI will forward the telephone number 
of the relevant person in the call centre. This means that the users can call a live agent 
or human directly by telephone for further details. On the other hand, if the users feel 
that it is expensive to make phone calls (especially long distance calls), then AINI will 
offer the instant messenger service using MSN Messenger through agent interface 
module as discussed in section 3.5.2.3. The instant messenger enables the user to 
interact with live agents in the call centre via IM. The live agent at the call centre will 
be able to trace the discussion, as well as the user's profile as shown in Figure 3.14. 



User’s Site 


Call Centre’s Site 

Figure 3.14: An example illustrating a user communicates with Live Agent (human) 
through MSN Messenger 


The excerpts in Figure 3.15 are taken from a typical single session of proxy 
conversation exchanged between AINI and a user called “Ahmad” at the Maybank 
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portal [171]. This session illustrates the nature of human-machine communication. The 
conversation continues from the moment when AINI transferred the query via text-chat 
and voice-chat with the avatar. The chat-based system will allow the user to interact 
with the system in a manner that simulates a normal conversation. Some of the typical 
inputs include greetings, questions, answers, orders, remarks and comments, to which 
the agent will be able to respond. 


# ID Message 


1 AINI Hi there : ) 

2 Ahmad what's ur name? 

3 AINI My name is Aini. What is your name? 

4 Ahmad Ahmad 

5 AINI How are you, Ahmad? 

6 Ahmad Wonderful! 

7 AINI Good to hear that! I feel wonderful too! 

8 Ahmad Oh yes it is. Are you alive? 



18 Ahmad Thank you. I want to know more about Yippie Savings 
Account. 



Figure 3.15: An example of Proxy Conversation from a Financial Portal based on 
WebChat 
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3.5.3. Agent Knowledge (Data Server Tier) 

Another significant difference between this research and other research on CAs is the 
domain knowledge model. Dahlback and Jonsson [172] stressed that the domain model 
represents the structure of the knowledge which comprises a subset of general 
knowledge. Such systems normally are comprised with two subcategories: the 
traditional/narrow domain or domain-specific, and the open-domain. In the traditional 
domain, systems attempt conversational fluency based on limited domains of expertise. 
ELIZA [35], for example, simulates a Rogerian psychotherapist, and its implementation 
is commonly known as DOCTOR and PARRY[33]. DOCTOR and PARRY’s domain 
was restricted to paranoid hospital patient expressions. SHRDLU [92] is another 
program simulating a CA which is able to interact within a simple world knowledge of 
“blocks” (see detailed in section 2.4.1). SHRDLU was an entry in an early Loebner 
Prize competition, where the evaluation was based on the restricted tasks [109]. 
However, in the Lifth Annual Loebner Prize Contest in 1995, the Loebner prize criteria 
were changed to include unrestricted domains [27], requiring computer entries to 
converse indefinitely with no topic restrictions. 

Hence, it is understood that general purpose CAs are not necessarily able to answer 
questions on a specific domain subject. On the other hand, domain-specific systems lack 
the flexibility to handle common sense questions. To overcome the above limitations, 
we proposed the Domain Knowledge Matrix Model (DKMM) [173]. The data server 
layer serves as storage for data and knowledge required by the system. This is where 
AINI’s conversational knowledge bases are stored. It is well understood that true 
intelligent action requires large quantities of knowledge. Such a reservoir of knowledge 
can be harvested from the Internet and deployed in the domain matrix knowledge bases’ 
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architecture. This forms the basis for the construction of large-scale knowledge bases to 
be used as the engine for intelligent conversation systems. AINI is the mechanism used 
to manage the knowledge and to provide appropriate answers to the user. 

3.5.3.1 Domain Knowledge Matrix Model (DKMM) 

AINI’s DKMM incorporates several knowledge subjects. This is analogous to the 
consultation of expertise knowledge from multiple experts. For example, a sales 
knowledge domain should contain expertise on how to improve sales. However a sales 
person is expected to have a wide range of common sense, general knowledge or world 
foremost inventor’s biographies [174] which enable CAs have ability to engage the 
potential customer in general conversation. Hence, an intelligent system should also 
incorporate open-domain knowledge to handle general or generic questions. By 
including multiple domain knowledge bases within AINI’s single knowledge domain, 
the proposed AINI will be able to hold “meaningful” and prolonged the conversations 
with the users. 


Knowledge Domains 



Figure 3.16: Domain Knowledge Matrix Model (DKMM) 
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In this proposed DKMM [175], both the open-domain and domain-specific knowledge 
bases are predefined in the agent’s knowledge. These modules are used to support the 
various knowledge levels at the agent brain tier. Depending on the user’s input, the 
agent will respond or switch from one level to another in the agent brain (discussed in 
section 3.5.4). While the system is capable to communicate with the user beyond the 
knowledge domain, there are cases where the system will exhaust its capability to 
answer the queries. In such case, the system will attempt to divert the focus back to the 
current topic of interest by responding with some predefined random statements. The 
purpose is to direct the user’s attention back to the system’s do main-specific state. 
Hence, AINI will attempt to “cycle” between the six levels of information processing 
within the agent brain tier supported by the various knowledge modules in the agent 
knowledge tier, discussed in Section 3.5.3. 

A way to view the proposed DKMM is given in Figure 3.16. In this approach, the 
knowledge base of the AINI can be considered as a collection of specific conversation 
domain units. Each unit handles a specific body of knowledge used during the 
conversation between AINI and the user. The knowledge can be seen as arranged in the 
vertical columns making up the open-domain or domain-specific knowledge. In 
addition, specific subjects are shown in the horizontal rows. For example, in the open- 
domain knowledge, the subject units will cover topics such as personality, business, 
biology, computers, etc. In this research, our focus is on the subject of medicine; and in 
particular, the bird flu pandemic. Therefore, additional bird flu domain knowledge is 
being incorporated in the domain-specific row “medical”, and column NL-Corpus 
(details on the extraction of the bird flu knowledge base from the web is discussed in 
Chapter 4). 
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In this research, the novel contribution is the development of the DKMM, which is a 
“domain knowledge plug-in module ” through the domain service tier (see section 
3.5.1.2). With this arrangement, the domain-specific knowledge and open-domain 
knowledge could become portable, scalable and incorporated easily with other domain 
applications. This approach will also allow future improvements to encourage 
collaborative contribution to the other domain applications and tasks. 

3.5.32 Open-Domain Knowledge Bases 

Because we are merely making an approximation of the world’s general knowledge, 
any knowledge generated or manipulated in the domain of the system is collectively 
referred to as domain knowledge. Some literature refers this as ‘world knowledge’, 
where the use of the word “world” is actually referring to the world as seen by the 
system and not the actual real world. The use of the term “world knowledge” can be 
confusing and thus is not practiced here. Another type of knowledge acquired through 
experience that is unrelated specifically to any domain is common sense knowledge or 
general knowledge. An quotation from Albert Einstein [176], stated, “ Common sense is 
that layer of prejudices laid down in the mind prior to the age of eighteen” implies the 
various deep misconceptions that people have in viewing the world. For example, we 
cannot tell machines that “all swans are white” even though many people believe so 
because as fact, some swans are black in colour. Thus, it will be extremely difficult for 
us to instruct machines on what is the truth if we ourselves have problems separating 
the truth from make-believe. Despite the progress by researchers like Saba [177] in 
generating common sense knowledge, it is still not a straightforward matter to provide 
common sense knowledge for machines, due to various problems as highlighted in the 
roadmap by Singh & Minsky [178]. 
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Open-domain conversational systems need to deal with questions about general 
knowledge; more specifically, knowledge about the facts of the subject of a given 
human conversation (stimulus and response). It is very difficult to rely on ontological 
information due to the absence of wide and yet detailed banks of world knowledge. On 
the other hand, these systems have much more information and data to be used in the 
process of answering queries than any domain-specific systems. In AINI’s conversation 
system, existing information from the large-scale mass collaboration MindPixel [45] 
and factoid training data sets from the Text Retrieval Conference’s (TREC) training 
corpus [179] have been deployed. The MindPixel corpus also uses ALICE Annotated 
AIML (AAA) [51], the Loebner Prize [180] and the Chatterbox Challenge winners’ 
[181] hand crafted knowledge bases, all of which are based on unrestricted domain. 
These are illustrated in the agent knowledge in Figure 3.3 and under the open-domain 
columns in Figure 3.16. 

MindPixel is a common sense knowledge component similar to OpenMind 54 and Cyc 55 . 
The system accepts public contributions. However, the Cyc model and OpenMind had 
bottlenecks which prevented truly large-scale collaboration [182], In fact, the 
knowledge does not grow by itself. Every new rule or axiom has to be entered manually 
and the process takes a lot of patience and time. Furthermore, information has to be 
input with the CycL programming language and to follow the rules of the system. The 
second drawback of Cyc is its complexity. It will need months to install and implement 
a system that is based on the knowledge base of Cyc. On the other hand, the MindPixel 
corpus contains nearly two million propositions of human common sense constructed 
by more than 50,000 people over a period of five years. Each entry in the file is a 
question with a yes/no or true/false answer, known as a mindpixel. This is a much 

54 http://www.openmind.org 

55 http://www.cvc.com 
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simpler and perhaps more concise strategy than that adopted by the OpenMind system, 
and it does have the significant advantage that answers to each question have been 
cross-validated by multiple users. The cross-validation allows a statistical probability 
of truth. Moreover, the MindPixel corpus is freely available to the public via GAC-80K 
for research purposes 56 . AINI uses only 5% or 100,000 MindPixel propositions. In 
practice, 5% of the training corpus is held back from training to act as a generalisation 
test to ensure the system does not simply memorise the corpus. Passing this 
generalisation test would be the basis for claiming that the system is able to replicate 
human-level intelligence in a machine. Although a lot of knowledge has been collected, 
it is recognised that the system still accounts for less than the innumerable “pieces” of 
common sense knowledge that are estimated to be involved with human intelligence 
[183] and the model of human thought [184], Here are a few examples: 

Does Microsoft usually admit its mistakes? 

Is a spoon an item of cutlery? 

Is artificial intelligence a waste of time? 

A second common sense knowledge component deployed by AINI is a training corpus 
from TREC as shown in Table 3.1. TREC, organized each year by the National Institute 
of Standards and Technology (NIST), has offered a specific track to evaluate large-scale 
open-domain question answering (QA) systems since 1999. Finding textual answers to 
open-domain questions in large text collections is a difficult problem. In this system, 
only factoid questions to be extracted and incorporated in AINI’s engine. In 
conversational systems, factoid questions should have only single factual answers [185- 
188]. These are considered as a good stimulus-response type of knowledge unit. 
Examples of such questions are, "Who is the author of the book, The Iron Lady: A 
Biography of Margaret Thatcher ?", " What was the name of the first Russian astronaut 

56 http://sluggish.um.cc/nrindpixel/gac80k-06-iulv-2005.html 
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to do a Spacewalk?" or "When was the telegraph invented?” TREC’s corpus has a 
considerably lower rate of answer redundancy than the web and thus, it is easier to 
answer a question by simply extracting the answers from the matching text. To gather 
this data, we automatically classified questions in the TREC 8 through TREC 10 test 
sets by their ‘wh’-word and then manually distinguished factoid questions, which 
represented around half of the initial corpus as shown in Table 3.1. 


Table 3.1: Number of Factoid Questions from TREC 8, 9 and 11 


TREC 

Factoid 

Question 


Text Research Collection 

8 

196 

• 

Financial Times Limited (1991, 1992, 1993, 1994) 



• 

the Congressional Record of the 103 rd Congress (1993), and the 

Federal Register (1994) 



• 

Foreign Broadcast Information Service (1996) and the Los Angeles 
Times (1989, 1990). 

9 

692 

Set of newspaper/newswire documents which includes: 



• 

AP newswire 



• 

Wall Street Journal 



• 

San Jose Mercury News 



• 

Financial Times 



• 

Los Angeles Times 



• 

Foreign Broadcast Information Service 

11 

109 

• 

MSNSearch logs donated by Microsoft 



• 

AskJeeves logs donated by Ask Jeeves. 


The third knowledge base in AINI’s open-domain knowledge model is obtained from 
the hand-crafted Annotated ALICE AIML (AAA) 57 , a Loebner Prize winner [23] 
conversation system’s knowledge base (see detailed in 2.4.2). AAA is a free and open- 
source software package based on XML specifications. It is a set of AIML scripts and 
this is the backbone of the award winning conversation system. AAA is specifically 
reorganised to facilitate conversational system developers in cloning the ‘brain’ of the 
conversation system, and to enable the creation of customised CA personalities. The 
approach has reduced the need to invest huge efforts in editing the original AAA 
content. AAA’s knowledge bases covered a wide range of subject domains based on the 


57 http://www.alicebot.org/aiml/aait/ 
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conversation agent’s “personality”. Examples of subjects include AI, games, emotion, 
economics, film, books, sport, science, epistemology, metaphysics, etc. These subjects 
are shown in Figure 3.16 as part of the DKMM. Perhaps some integration with this 
common sense knowledge will help the earlier CAs to transcend their present 
limitations. 

3.5.3.3 Domain-Specific Knowledge Bases 

At present, the World Wide Web provides a distributed hypermedia interface to a vast 
amount of information available online. For instance, Google [189] currently has a 
training corpus of more than one trillion words (1,024,908,267,229) from public web 
pages. This is valuable for many types of research. The Web is a potentially unlimited 
knowledge repository; however, commercial search engines may not be the best way to 
gather answers to queries, due to the overwhelming number of results from a search. 

As shown in Figure 3.3 and Figure 3.16, AINI’s domain-specific knowledge bases 
consist of Natural Fanguage Corpus and Frequently Asked Questions (FAQ). Both 
domains are extracted from online documents using the AKEA as described in [156]. 
Another significant aspect of this research is the objective of AINI to deliver essential 
information from trustworthy [190] sources (discussed in Chapter 4) while being 
capable of interacting with the users. 

3.5.3.4 Stimulus-Response Categories in AINI’s Knowledge Bases 

Currently 58 AINI’s open-domain knowledge base has more than 160,000 entries in the 
common sense stimulus-response categories. Of these, 100,000 came from MindPixel, 
997 factoid questions from the TREC training corpus and 45,318 categories from the 
AAA knowledge bases. On the domain-specific knowledge base, AINI has more than 
58 Till 1 August 2007, AINI’s have 161,473 stimulus-response categories in their knowledge base. 
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1,000 online documents extracted by AKEA. This makes up over 160,000 stimulus- 
response items in total. AINI also has 158 FAQ pairs of questions and answers, which 
have been updated using AKEA. In addition, AINI has also collected more than 52,890 
utterances in conversations with online users since the first prototype of AINI was put 
online in the February 2006. These utterances will be integrated into AINI’s knowledge 
bases through supervised learning by domain experts (see details in 3.5.4.7). At present, 
AINI has learnt about 5,000 categories from conversations with online users. All of this 
combined knowledge has made up the total of 161,473 stimulus response categories in 
AINI’s knowledge bases. To compare AINI with other systems, the original 
conversation programs such as ELIZA, written by Professor Joseph Weizenbaum of 
MIT, has only 200 stimulus response categories. ALICE Silver Edition was ranked the 
"most human" computer, and has about 120,000 categories, which include 80,000 taken 
from MindPixel as summarised in Table 3.2. 


Table 3.2: AINI’s Stimulus-response Categories 


Domain 

Knowledge 

Sources 

Categories 

% 

Domain-Specific 

NL Corpus 

10,000 

6.19% 


FAQ 

158 

0.10% 

Open-Domain 

MindPixel 

100,000 

61.93% 


TREC Corpus 

997 

0.62% 


AAA 

45,318 

28.07% 

Supervised Learning 

Conversation Logs 

5,000 

3.10% 

TOTAL 

161,473 



3.5.3.5 Proxy Conversation Example 2 

Multiple domain knowledge bases merged into AINI’s single domain knowledge base 
through DKMM could prolong the human-computer conversation. We predefined the 
open-domain and domain-specific knowledge bases in the agent knowledge tier. Even 
though the conversation agent enables the users to have free-style conversation, the 
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agent will continue to remind and recall the users back to the current topic of the 
presentation. This is to convey and direct the users’ attention back to its original open- 
domain or domain-specific state. However, in this case study [89] 59 , the priority will be 
given to the domain-specific, SARS. Therefore, the conversation agent will always give 
higher priority to Domain-specific questions in an attempt to keep the user focus on the 
topic of the presentation. An example of a domain knowledge model conversation 
between “John” and “AIM" is shown in Figure 3.17. 


rr 

ID 

Message 1 

1 

John 

Hi 

2 

AINI 

Hello there. What is your name? 

3 

John 

My name is John 

4 

AINI 

Nice to meet John! 


John 

You still remember when was SARS first carried out into 

5 


the world? 

6 

AINI 

Yes. SARS was first carried out into the world at large 
on 21 February, at Metropole Hotel, Hong Kong. 

7 

John 

When did Singapore mark on the SARS-free anniversary? 

8 

AINI 

Monday, 24 May 2004 

9 

John 

Thanks for your information. When WHO declare world- 
free SARS? 

10 

AINI 

On July 5, 2003, the World Health Organization declared 
The world free of ongoing SARS Transmission. 

11 

John 

Ok, now why not we talk about H5N1 Bird flu. 

12 

AINI 

I'm not sure how this related to SARS? Why don't you 
ask me more specific on SARS? 

13 

John 

No, I am more interested on the current pandemic, H5N1? 

14 

AINI 

I shall do a search for it. 


Figure 3.17: An example of Proxy Conversation on SARS based on Domain 
Knowledge 


From Figure 3.17, the conversation between John and AINI took place in the 
presentation on the topic of the SARS epidemic crisis. From the above conversation, 
transition state in dialogue pairs #1 to #4 used open-domain and in dialogue pairs # 5 to 
#10 used domain-specific. However in dialogue pairs #11 to #13, the user has control of 
the conversation although the agent reminds the user of the topic of the current 


Domain-specific knowledge has been extracted using AKEA in the previous research during a severe acute 
respiratory syndrome (SARS) epidemic in 2003. This research was funded by University Technical of Malaysia 
Melaka, Grant under contract number PJP/2003/FTMK( 1) (S017) 
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presentation. The transition state in dialogue #14 is under the random state and the 
dialog set is taken from a predefined dialog set or random answers. The dialog contains 
“I shall do a search for it.”, “Did I misunderstand your meaning?”, “That's an 
interesting question. I'll come back to that in a minute. ”, etc; phrases will be monitored 
and submitted into the unanswered database for supervised learning. The domain expert 
is free to modify and enhance the unanswered database set and then forward to the FAQ 
knowledge base for future conversation. In this proxy conversation example, the H5NI 
bird flu pandemic knowledge bases were not available at that time. This was 
subsequently considered in this research and is described in Chapter 4. 

3.5,4, Agent Brain (Application Server Tier) 

In the classical CA design, the original ELIZA used MAD-Slip programming language. 
The technique that is in use in a "CAs database" or "script file" to represent the CA 
knowledge is known as Case-Based Reasoning (CBR) [35]. The original ELIZA had 
only about 200 rules, and 90% of ELIZA’s stimulus-responses are found in the 
associated script file. However, these limitations have been surpassed by Loebner Prize 
[23] winning CA, which are smarter than before. More new features have been added 
since the last submission. The "script file" which acts as an agent brain for the CAs has 
been completely rewritten, and it is definitely better than in the previous versions of the 
program. This includes ALICE, one of the 'most-human' natural language CAs, which 
uses a programming language called AIML that is specific to its program, and its 
various clones, named ALICEBots [38]. Nevertheless, ALICE is still based on pattern 
matching and case-based reasoning (PMCBR), whereas natural language understanding 
and reasoning is not available in ALICE. This is the same technique that ELIZA, the 
first CA, was using back in 1966. 
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3.5.4.1 Multilevel Natural Language Query 

In AINI, the communication with users takes place through typed text messages and is 
processed based on natural language query. AINI’s engine implements its decision 
making network based on the information it encounters in the six levels of natural 
language modules, as have been shown in Figure 3.18 and discussed in references [191] 
[192], The input and output of each module is an XML-encoded data structure that 
keeps track of the current computational state. The knowledge modules can be 
considered as transformations over this XML data structure. The system accepts queries 
from the users and it processes the queries based on the information contained in AINI’s 
knowledge bases. The system is implemented by open-source architecture based on 
LAMP solution and knowledge bases stored in a MySQL server. All the domain 
services are written in the Perl scripting language. Perl has been chosen because it has 
advantages such as its use of the concepts of objects, modular, arbitrary data structures, 
classes, methods, and inheritance. In addition, Perl also has a huge collection of 
modules freely available from CP AN (Comprehensive Perl Archive Network) 60 . 

The agent brain tier handles the process of the queries or business logic. Here, one or 
more domain service tiers are configured to compute the dialogue logic through the 
multilevel natural language query algorithm. In this tier, it is based on a goal-driven or 
top-down natural language query (NL-Query) approach, which is similar to the way that 
humans process their language. As indicated by literature in the field of Natural 
Language Processing (NLP), the top-down approach is by far the best approach. 
Mentalese, or 'language of thought’, and conceptual representation support the ideas of 
a top-down approach [193]. This was also supported by research in generation schemas 
[194], rhetorical structure theory [195], summarisation [196], plan-based approaches, 
[197], and SHRDLU [92] the first CA to use NLU, are examples of top-down 

60 http://www.CDan.Org/modules/0lmodules.index.html 
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approaches. Therefore, AINI’s agent brain uses a top-down NL-query approach to 
simulate human conversation. However, in the robotic design, the MIT Cog Robot 
research fervently supports the bottom-up approach when modelling the human brain 
[198], 


As shown in Figure 3.18, the top-down multilevel natural language query approach 
consists of six levels of queries, namely Spell Checker (Level 0), Natural Language 
Understanding and Reasoning (NLUR) (Level 1), FAQChat (Level 2), Index Search 
(Level 3), Pattern Matching and Case-based Reasoning (PMCBR) (Level 4) and 



Open-Domain Knowledge 
Base (AAA) 


Supervised Learning 
by Domain Expert 


Supervised Domain 
Knowledge Base 


Conversation 

Logs 


Leve 5 


Full-discourse Natural Language Understanding and 
Reasoning (NLUR) through Network Based Advanced 
Reasoning technique. 


Frequently Asked Questions (FAQ) capture 
the logical ontology of a given domain 
without natural language processing or 
logical inference. 


Search is done by identifying the 
keyword using 
probabilistic/statistical 
approaches. 

Pattern Matching & 

Case-based Reasoning 
(PMCBR) algorithm. 


Spelling checker recognizing frequently made typos, 
misspellings and misconceptions from users query. 


Figure 3.18: Multilevel Natural Language Query 


Supervised Learning (Level 5). All these levels were designed modularly and can be 


plugged in easily without the changing of the entire codes. 


3.5.4.2 Spell Checker 


Level 0 is the most critical level where the system will recognise frequently made 
typographical errors, spelling mistakes, and misconceptions from users’ queries. It 
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analyses all terms in the user’s query to identify the most likely intention. Its main 
feature is to suggest possible replacements for any misspelled word. 

At the start in correcting spelling errors (the misspelled words are considered to be 
those that are not stored in our dictionary) a similarity function between two words is 
used [199]. With this function, the system will be capable of either replacing the 
misspelled word or generating a list of words that are similar to the misspelled word 
with equation 3.1 [199]: 

k 

52[3.x-min( z iX ,z i2 )- max(z L1 ,z i2 )l 

/-I 

(3.1) 

SIM= i i 

IX + IX 

i-i i-i 

where z,\, z ,2 represent the number of occurrences of the i th character in the first and the 
second words respectively. The similarity value is standardised within interval [-1, 1]. 
The value equals 1 when two words are identical and -1 when they are different. 

The spell check will check occurrences of all words found in the dictionary; it is able to 
suggest common spelling for proper nouns (names and places) that might not appear in 
a standard spell check program or dictionary. The system automatically checks whether 
the user is using the most common spelled word in the query. For example, when the 
question “What is bird flo .’’is asked, the spelling checker will detect a wrongly spelled 
word in the query and then replaces the misspelled word “flo” to “flu”. The current 
spell checker is based on occurrences of all words in the dictionary, and it is able to 
suggest common spellings for proper nouns. Regardless of whether it suggests an 
alternative spelling, the spell check will return results that match the query if there are 
any. After this verification stage, the query will then go to Level 1. 
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3.5.4.3 Natural Language Understanding and Reasoning (NLUR) 

Natural Language Understanding and Reasoning (NLUR) in Level 1 is the most 
important level of the AINI system. It refers to the process of constructing machine 
understandable meaning representations from natural language inputs. Preliminary 
definitions of what "understanding" natural language could imply as introduced by 
[200], that suggests that " an intelligent person or program should be able to answer [...] 
questions based on the information in [...] (a) story”. According to Hubert Dreyfus[201] 
in his controversial book “What Computers Cannot Do” (and the revised version “What 
Computers Still Cannot Do”), the fundamental reason why computers cannot achieve 
human level intelligence, including the ability to understand human language, is that 
computers cannot use any formal symbolic system to adequately model the vast 
background knowledge which humans take for granted in interpreting and reasoning. 
Since the publication of Dreyfus’s book, research of AI in general has gradually shifted 
from trying to come up with a general problem solver [202], to solving specific 
problems in narrowly restricted domains. SHRDLU 61 [92] is a classical natural 
language understanding written in MacLisp at the M.I.T. Artificial Intelligence 
Laboratory in 1968-70. SHRDLU uses a top-down, left-to-right parser that analyses a 
pattern, identifies its structure, and recognises its relevant features and grammar. But 
the system can only answer simple queries about the current state of its toy block world. 
SHRDLU demonstrated the promising future of NLU research at that time. 

It is recognised that the existing conversation systems lack some essential features. 
NLU for instance has been used in the previous experiments by Badler et al. [203], 
Kairai [204-206] and Fa£ade [207-209] and it was a theory about human language 
understanding processes. But the results, with respect to practical working systems, 
were still lacking in the area of reasoning. 

61 http://hci.stanford.edu/~winograd/shr dlu/ 
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In previous papers [191, 192], NL-query is based on NLUR comprising three parts, 
namely (a) understanding documents to produce facts which will be integrated into the 
knowledge base, (b) understanding questions and finally, (c) reasoning using facts and 
rules to look for answers from the knowledge base. The graphical representation of the 
architecture is depicted in Figure 3.19. The practice of natural language understanding 
is widely reflected through the use of understanding modules for both the question and 
information source. 
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Figure 3.19: Natural Language Understanding and Reasoning Architecture 


The design of the NLU mechanism took into consideration the various levels of analysis 
up to the discourse level [210]. Although there are existing concepts or techniques out 
there for various stages of analysis in NLU, they are mostly studied separately without 
regard for compatibility of the algorithms which are required to be integrated for full 
natural language understanding. Hence, for this research, a series of algorithms have 
been proposed based on actual theories for various stages of analysis that were designed 
to work seamlessly together. In syntax analysis, an existing external module for 
sentence parsing called X-MINIPAR is used (see details in Section 5.2.1). X-MINIPAR 
was a modified version of the off-shelf MINIPAR [211]. X-MINIPAR has been 
enhanced to allow the parser to load the hash tables once and stay resident (as a 
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background daemon process) so that the parser can parse multiple sentences without 
having to re-load the hash tables each time. The original MINIPAR has been patented in 
the United States of America 62 and is a broad-coverage parser for the English language. 
An evaluation with the SUSANNE corpus parses newspaper text at about 500 words per 
second, MINIPAR achieves about 88% precision and 80% recall with respect to 
dependency relationships [211]. 

A typical full-discourse NLUR system consists basically of two subsystems, namely 
NLU and network-based advanced reasoning system as shown in Figure 3.19. The NLU 
subsystem is responsible for reading and understanding two things: questions from 
users, and sentences of processed news articles from a news repository. The process is 
carried out in four phases by four natural language processing modules, namely (a) 
sentence parsing, (b) named-entity recognition, (c) relation inference, and (d) discourse 
integration. 

However, the network-based advanced reasoning subsystem is responsible for 
discovering the valid answer and generating an unambiguous answer or explanation in 
response to users’ questions [212], The process is executed in five phases by five 
modules, namely (a) network-to-path reduction, (b) selective path matching, (c) 
relaxation of event constraint, (d) explanation on failure, and (e) template-based 
response generation. 

The network-to-path reduction module collapses the query network into sets of path 
sequences to reduce the complexity in discovering the answer. The output of network- 
to-path reduction is two sets of path sequences that will be used by the selective path 
matching module. This is to discover the answer from the semantic network through a 

62 http://www.patentstorm.us/patents/7146308-description.html 
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series of conditional path unifications. To extend beyond the literal matching of path 
sequences, ontological information is utilised to put into consideration events that are 
hierarchically equivalent. This process is performed by the module relaxation of event 
constraint. In case of failure during the discovery of a valid answer by selective path 
matching, an explanation or justification is dynamically generated by the explanation in 
failure module as an alternative response. This process is carried out based on the 
context of the question and the current status of the semantic network. If answers can be 
validly discovered, then readable natural language responses are generated by the 
template-based response generation module. Appendix B shows full sentence parsing of 
the NLUR for a sentence “ Bird flu did occur in which countries?” Other variant 
questions will generate the same results such as 

• Which countries has bird flu occurred in? 

• In which countries has bird flu occurred? 

• In which countries did bird flu occur? 

• Where has bird flu happened? 

• Which countries has bird flu happened in? 

The NLUR algorithm found four matched results, namely, “Malaysia, Germany, 

Thailand and Vietnam” from different trustworthy websites 63 based on the Web 
Knowledge Trust Model (WKTM) and extracted by AKEA (see details in Chapter 4). 


3.5.4.4 Frequently Asked Questions (FAQChat) 

In Level 2, sophisticated NLP or logical inference are ignored as they have already been 
performed in Level 1. FAQs are Frequently Asked Questions, designed to capture the 
logical ontology of a given domain, or domain-specific. Any natural language interface 


63 Malaysia - www.whitehouse.gov/news/releases/2005/11/20051101-1.html 
German - www.pandemicflu.gov/general/workshoDmornmg.html 
Vietnam - www.who.int/csr/disease/avian influenza/avian faqs/en/index.html 
Thailand - www2a.cdc.gov/HAN/ArchiveSys/ViewMsgV.asp?AlertNum=00221 
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to an FAQ is constrained to reply with the given answers, so there is no need for NL 
generation to recreate well formed answers, or for deep analysis or logical inference to 
map user input questions onto this logical ontology; a simple (but large) set of stimulus- 
response matching rules will suffice. This simplistic approach works best when the 
user's conversation with the CA is likely to be constrained to a specific topic (in our 
case, we restricted conversations to crisis communication on the bird flu pandemic). By 
using this approach, the system has the ability to give direct answers or suggest related 
links by using the URL push technique [156], whereas Google only gives links. In 
addition, since FAQChat uses the logical ontology of a given specific domain from the 
database, the number of results returned by the FAQchat is less than those returned by 
Google from the Web, which saves time browsing and searching [213]. 

3.5.4.5 Index Search 

Index search is based on information from Internet resources; more precisely, it is 
structured information having been indexed. It can be as simple as an author's name or 
as complex as a geographic code or a controlled-vocabulary subject heading. From a 
technological perspective, Level 3 relies on the application of a mix of linguistic rules 
and probabilistic or statistical principles. On one end of the spectrum, solutions apply 
linguistic rules to “clean” the document of any specific formatting and perform noun¬ 
phrase or verb-phrase analyses in the repository. At the other end of the spectrum, 
solutions rely on a simple statistical keyword matching algorithm. 

3.5.4.6 Pattern Matching & Case-Based Reasoning (PMCBR) 

Level 4 is based on empirical techniques called pattern matching and case-based 
reasoning (PMCBR). These programs descend from the early ELIZA [35] and ALICE 
programs [214] and use AIML algorithm. AIML was the top-ranked amongst the 
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conversation engines in the Loebner Prize, and won three times in 2000, 2001, and 2004 
[40], 

Tackling the four limitations of ELIZA (see detailed in Section 2.4.1.1), Wallace [214] 
proposed to expand memory structure using script specification for programming the 
memory structure for a conversation system. AIML is a derivative of XML, the 
versatile extensible Markup Language. The most important AIML [152] units 
are:<aiml>, the tag that begins and ends an AIML document. Some of the other tags are 
given below: 

• <category>, the tag that marks a "unit of knowledge" in the system’s memory 
structure. 

• <pattern>, the tag that contains a simple input pattern rule that matches what a 
user may type. 

• <topic>, the tag that contains current conversation topic pattern rule. 

• <that>, the tag that refers to the system’s previous reply as a history pattern rule. 

• <template>, the tag that contains the response to a user input. 

AIML consists of data objects called AIML objects, which are made up of units called 
topics and categories. The topic is an optional top-level element, it has a name attribute 
and a set of categories related to that topic. Categories are the basic units of knowledge 
in AIML. Each category is a rule for matching an input and converting to an output. 
Each category also consists of a pattern, which represents the user input, and a template, 
which implies AINI’s answer. The AIML pattern is simple, consisting only of words, 
spaces, and the wildcard symbols _ and *. The words may consist of letters and 
numerals, but no other characters. Words are separated by a single space, and the 
wildcard characters function like words. The pattern language is case invariant. The 
idea of the pattern matching technique is based on finding the best and longest pattern 


match. 



AIML representation is a series of key stimulus (known as patterns), and responses to 
those key phrases (known as templates). In ELIZA, the <pattern> tag part is named the 
decomposition rule and the <template> tag is the reassemble rule. The following is an 
example of an AIML memory unit [152, 215] in a topic about “name”. <set> and <get> 
tags are used to store information during conversation. AIML has much more 
possibilities for replying with responses based on their topic and history. Using AIML 
gives the potential to create new content by a dialog analysis as shown in Ligure 3.20. 

<category> <that>*</that> 

<pattern>WHAT IS YOUR NAME</pattern> 

<template>My <set_topic>name</set> is <bot name="name">.</template> 
</category> 

<topic name="NAME"> 

<category><that>MY NAME IS *</that> 

<pattern>CAN I CALL YOU *</pattern> 

<template> 

<random> 

<li> <star/>? Huh! Like I've told you my name is <bot 
name="name">.</li> 

<li> You can call me whatever you like</li> 

</random> 

</template> 

</category> 

<category><that>MY NAME IS *</that> 

<pattern>I HATE *</pattern> 

<template> 

<random> 

<li> I don't care, you can only call me, <bot 
name="name">.</li> 

<li> Why? A sad memory perhaps?</li> 

</random> 

</template> 

</category> 

</topic> 


Example responses: 

User : What is your name? 

Alice: My name is Alice 

User : Can I call you Madonna? 

Alice: Madonna? Huh! Like I've told you my name is Alice. 
User : I hate that name. 

Alice: Why? A sad memory perhaps? 


Figure 3.20: AIML Categories and Pattern Matching 
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The factor that makes AIML different from simple pattern matching is its support of 
recursion. There are six reasons why recursion might be used in AIML [215]. The most 
important one is symbolic reduction, which transforms a complex phrase into one that is 
simpler as shown in Figure 3.21. 

<category> 

<pattern>DO YOU KNOW WHO * IS</pattern> 

<template><srai>WHO IS <star/x/sraix/template> 

</category> 

Figure 3.21: AIML Symbolic Reduction (SRAI) Technique 

This AIML schema is used to build AINI’s natural language query engine. Our 
approach uses an enhanced AIML algorithm to handle conversations through the 
PMCBR techniques. The PMCBR algorithm simply uses the relational database to store 
a binary prefix tree of all the pattern strings listed in the AIML files. Tree patterns can 
be used in programming languages as a general tool to process data based on its 
structure. A recursive function written in Perl searches the tree for a string that matches 
the input query. Once found, it dynamically retrieves the corresponding template from a 
second table. A relational database is used as a rapid way of accessing the information 
usually stored in the AIML files. The existence of the prefix tree eliminates a few 
preprocessing steps. Therefore, the prefix tree is able to return near instantaneous 
responses to the users. The purpose of the PMCBR is therefore to prolong the 
conversation and to achieve believability in the interaction. 

3.5.4.7 Supervised Learning approach by Domain Expert 

Finally in Level 5, in order to prevent the situation whereby no answer was found, AINI 
will generate dynamic responses and the unanswered question will be allocated to the 
learning module. The AINI’s learning approach is based on a stimulus-response model. 
The basic learning model in AINI consists of two patterns: One of them is the stimulus 
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and the other is the appropriate response to that stimulus. When a stimulus-response 
pair for learning is submitted, AINI is able to immediately capture the stimulus- 
response relationship and then repeat the response each time it receives the same 
stimulus. 

In this level, we proposed supervised learning by domain human expert. Human 
intervention can improve model matching from two aspects. First, a human expert is 
able to set up a matching context by applying domain constraints or configuring 
heuristic parameters to speed up the matching. Second, a domain expert can correct 
errors during the matching procedure and follow up training to avoid future errors. 
Therefore, compared to a fully automated approach, a domain-specific semi-automated 
approach that utilises prior matching knowledge and domain knowledge will 
undoubtedly lead to better performance and accuracy. 

The responsibility of the domain expert is to correct matching errors and to pick the 
proper matching result from a list of possible matches from the conversation log for the 
questions that are unanswered. Finally, the newly generated matching rules 
subsequently will be stored and upgraded into the FAQChat knowledge base in Level 2. 
The process of queries from Level 0 to Level 5 will continue until an answer is found 
for the query. 

In summary, the multilevel natural language query approach includes the following 
stages: 

• AINI parses the user’s input by reading in sentences one by one, and then each 
sentence is converted to a list of words to be processed by the spell checker. The 
spell checker will propose alternative words from the dictionary if it finds any typo 
errors. If this step is passed, AINI would attempt to formulate a reply to the 
sentence by calling the next module; 
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• The sentence could be sent to the NLUR module. This module will do the sentence 
parsing to produce grammatical categories and grammatical relationships such as 
Noun Phrase and Verb Phrase. Name-Entity Recognition will then identify each of 
the keywords to look for a match in the gazetteer for their relation inference and 
discourse integration. If this step is successfully parsed, the network-to-path 
reduction will be carried out by network-based advanced reasoning. From the query 
network, the set of sequences question from the leaf node to the root node is 
obtained. 

• If an empty string is returned from the database, AINI will then go to the FAQ 
Module using an FAQ question-answering system. In this stage, sophisticated 
natural language processing or logical inference is ignored as they have already 
been performed in the previous module. 

• If no database match was found, AINI will then go to the next module, called Index 
Search. This module relies on the application of a mix of linguistic rules and 
probabilistic or statistical principles. 

• If this also fails, AINI will then try to detect whether the sentence was a trick 
question. It would look for the PMCBR module, which is commonly found in 
EFEZA and AFICE. In this case, AIMF technology will respond with suitable witty 
replies. 

• If even this module fails to work (which could happen if all of its witticisms had 
already been used), then AINI’s routine would be invoked to randomly generate a 
reply. In this stage, the result checking will still be the responsibility of the domain 
expert, who will be able to correct errors and to pick a proper matching response 
from a list of possible matches from the conversation log which is unanswered by 
AINI. This undoubtedly will lead to better performance and accuracy in future 
conversations. 


3.5.4.8 Proxy Conversation Example 3 

In this proxy conversation [ 1 ] 64 , the H5N1 Bird Flu pandemic is being the domain- 
specific research focus. The H5N1 pandemic becomes increasingly important to 
demonstrate the utility of our research in real-world applications. According to the Wall 

64 

This project was supported by Murdoch University Division of Arts Research Excellence Grant Scheme 2005/06 
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Street Journal Online [216], this pandemic could be even worse than the deadliest 


pandemic of the past century, in 1918, which killed at least 20 million people at a time 
when the world had a smaller population which travelled less. An addition, the World 
Health Organization estimates that the H5N1 virus could i nf ect up to 30 percent of the 
world's population. Shigeru Omi [217], the WHO official who issued the warning, said 
that estimates of 2-7 million deaths were "conservative" and that the maximum range 
could go as high as 50 million deaths. This proxy conversation example also aims to 
investigate how AINI’s framework can be adapted and extended from previous research, 
which used a knowledge domain on the SARS epidemic. 

In this example, bird flu pandemic knowledge has been predefined in the domain- 
specific data tier. Based on the type of input provided by the user, the agent’s response 
state moves from one domain knowledge base and NL-query Level to another. An 
example of the interaction of domain knowledge model and NL-Query in a conversation 
between Isabel and AINI on the bird flu pandemic is shown in Figure 3.22. 


nr 

ID 

Level 

Message 1 

1 

USER 

L4 

Hi 

2 

AINI 

L4 

Hello there. 

3 

USER 

L4 

What is your name? 

4 

AINI 

L4 

My name is Aini. What is your name? 

5 

USER 

L4 

Isabel 

6 

AINI 

L4 

Nice to meet you Isabel! Are you a man or a woman? 

7 

Isabel 

L4 

Woman 

8 

AINI 

L4 

Thanks for telling me your gender, Isabel. I am a woman 
too. 

9 

Isabel 

L0 

How are you toda? 

10 

AINI 

L0 

I am doing very well. How are you Isabel? 

11 

Isabel 

L4 

Fine 

12 

AINI 

L4 

Great to hear that. 

13 

Isabel 

L3 

Have you heard anything about bird flu? 

14 

AINI 

L3 

Did you mean H5N1 avian flu virus, a global flu 
pandemic? 

15 

Isabel 

L3 

Yes 

16 

AINI 

L3 

Anything I can help? 

17 

Isabel 

L2 

Just want to know what is bird flu pandemic? 

18 

AINI 

L2 

Bird flu or avian influenza is an infectious disease of 
birds ranging from mild to severe form of illness. Some 
forms of bird flu can cause illness to humans. 

19 

Isabel 

LI 

Can a pandemic be averted and how many has been 


confirmed? 


93 






20 AINI 


LI 


Cumulative number of confirmed human cases of avian 
influenza A/(H5N1) reported by WHO. More information at 
http : //www ■ who.int/csr/disease/avian_influenza/country/ 
cases_table_2 00 6_05_23/en/index.html 


21 

Isabel 

L4 

Thanks for your information, can we talk 
topic 

about 

other 

22 

AINI 

L4 

My pleasure. I'm not sure how this related 

to Bird 

Flu? 

23 

Isabel 

L5 

I am more interested on chronic disease. 



24 

AINI 

L5 

OK. I would do a search for it. 






(Domain expert will update this unanswered question from the 
automatically upgraded into a FAQChat) 

Conversation continue... 

logs and 

will be 


Figure 3.22: An Example of Proxy Conversation Log on H5N1 based on the NL- 
Query 


From the conversation log in the figure above, the transition state in dialogue pairs #1 to 
#8 and #11 to #12 used Open-Domain from the NL-Query Level 4 where the pattern 
matching and case base reasoning (PMCBR) approach has been carried out. In the 
dialogue pair #9 and #10, the NL-query found the misspelled word “ toda ” which was 
replaced with “today” in the response. In the dialogue pairs #13 to #16, NL-Query 
Level 3 has been imposed, where the search is done by identifying the keyword or 
phrase using probabilistic or statistical approaches from the indexed documents. In the 
dialogue pair #17 and #18, the FAQChat approach captured the logical ontology of a 
given domain. In this level, Level 2, FAQChat is constrained to reply with the given 
Answers without natural language (NL) generation to recreate well formed answers. 
However, in dialogue pair #19 and #20, full-discourse NLUR through Network Based 
Advanced Reasoning technique with domain-specific has been used. The dialogue pair 
#23 and #24 shows that AINI is unable to answer the user’s question, but she will 
forward a random statement such as “I would do a search for it. ”, “Did I 
misunderstand your meaning ?”, “That's an interesting question.”, “I’ll come back 
to that in a minute” etc, and these statements will be monitored and submitted into the 
unanswered conversation logs data tier. In this level, Level 5, a domain expert will be 
responsible for picking up the proper matching result from a list of possible matches. 
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Finally, the newly generated matching rules will subsequently be stored and upgraded 
into the domain-specific knowledge set. Another significant result shows that in the 
dialogue pair #21, the user, Isabel, had control of the conversation, although AINI 
reminded Isabel of the topic of the current presentation, which was Bird Flu. In 
addition, in Level 5, the domain expert can also integrate the answer with a relevant 
source from the Internet, using the “URL Push” technique. This will make the 
conversation more interesting, and ensure the information forwarded to the user is up to 
date. 

3.6 Adaptability of the AINI’s Framework into other Domain Applications 

The following section demonstrates how the process of migrating from the SARS 
domain into the Bird Flu domain can be adapted into AINI’s framework based on 
RESM (described in 3.2). This embodiment of the framework uses modified A-tiered 
architecture, which is significantly different from the previous client-server design. 

This migration from the SARS epidemic to Bird Flu pandemic domain applications an 
illustration of AINI’s framework characteristics which are reusable, extensible, scalable, 
and modular. The system is also able to support legacy algorithms with significant 
investments. Figure 3.23 shows the modules and illustrates the flow of control for the 
AINI application. This should be compared with Figure 3.3, AINI’s conversation 
architecture. The channel services tier and domain service tier, highlighted in grey and 
circled, as well as the CA interface at the agent body, are components of the AINI 
framework, as distinct from this independent implementation. Both the domain 
knowledge and anthropomorphic interaction module parts of the framework have been 
extended to cater for the application functionality required. 
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Figure 3.23: An Adaptive AINI Framework Architecture 
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Figure 3.24: An Example of Migration from SARs to Bird Flu Domain Application 


In addition, using existing websites embedded with anthropomorphic interaction 
module or integrated into mobile services or other network protocol definitely would 
cut the development time as shown in Figure 3.24. For instance, to deploy AINI into 
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other existing websites, the developer only needs to add a single line of code with 


<iframe> tag into any HTML page as shown in Figure 3.25. 


<iframe src=" http://ainibot.orq/aini.html " width=590 
height=350 frameborder="0" name="avatar_frame" 
scrolling="no"> </iframe> 

Figure 3.25: Deploying the AINI object into an Existing Website 


f ■ [Table] aini @aiml (localhost) 

EBB 

1 File Edit View Window 

I ~j>3 Import Wizard "^1 Export Wizard ^ Filter Wizard 

=d Memo Hi Hex Image ^ 


Ijd_[bot_ [name _ [value _| A 



1 

1 name 

Aini 


2 

1 gender 

female 


3 

1 master 

Ong Sing goh 


4 

1 birthday 

1 March 2005 


5 

1 birthplace 

Perth, Australia 


6 

1 boyfriend 

Mahathir 


7 

1 favoritebook 

Don't Read Me (ALICE) 


8 

1 favoritecolor 

transparent 


9 

1 favoriteband 

rubber 


10 

1 favoritefood 

knowledge 


11 

1 favoritesong 

your voice 


12 

1 favoritemovie 

your life story 


13 

1 forfun 

talk to you 


14 

1 friends 

you 


15 

1 girlfriend 

Eliza and Alice 


16 

1 kindmusic 

all 


17 

1 location 

here 


18 

1 looklike 

you 


19 

1 question 

What? 

► 

20 

1 sign 

Virtual 


21 

1 talkabout 

anything 


22 

1 wear 

nothing 


_23 

1 website 

http: //ainibot. murdoch. edu. au 


24 

1 email 

ainibot@hotmail. com 


25 

1 language 

any 


M -< ► w + — C“0 Record 20 of 26 in Page 1 

UPDATE'aini'SET'value'='DonYt Read Me (ALICE)'WHERE ('id'=7) ^ 4 4| JV 


Figure 3.26: AINI Customisation Profiles 


Moreover, the developer also can customise their CA profiles, by selecting information 
such as name, gender, botmaster, birthday, birthplace, age, friends, favourites, language, 
etc. as shown in Figure 3.27. This is achieved by editing the CA’s properties profile 
from a relational database. This example of extending the framework demonstrates how 
the anthropomorphic avatar is also extensible. Hence, AINI’s framework has shown that 
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an architecture can be easily extended and applied into other domain applications and 
tasks such as e-learning [218] and games [219]. 

However, in the process of migrating from the SARS domain application to the Bird Flu 
domain application, the most time-consuming task is creating AINI’s domain-specific 
knowledge bases. Therefore, automated knowledge extraction has been proposed in 
Chapter 4 to extract the knowledge base from existing online documents on the Web. 

3.7 AINI compares with others Conversation Agents 
3.7.1. Multilevel Natural Language Query 

AINI conversation system differs from other approaches because AINI was 
implemented using a top-down multilevel natural language query [192]. In this 
multilayer natural language query, a plug-in module has been proposed. The plug-and- 
play module consists of spell checker, Natural Language Understanding and Reasoning, 
FAQChat, Index Search, PMBCR and supervised learning module. AINI’s plug-in 
module can be integrated easily and offers a much more scalable approach for the web 
context than previous work. This is because AINI was designed to be able to produce 
generic responses on a wide range of topics, while also covers the targeted topic of 
conversation. 

3.7.2 Spelling Correction 

The original AIML algorithm does not include a spell check function [46, 47]. Although 
ALICE tries to correct the user input and acts as a language tutor, by correcting a single 
common user spelling mistake, such as "your" when "you’re" or "you are" is intended, 
but the methods used are hard coded and usually only catch single errors as shown in 
Figure 3.27. 
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<category> 

<pattern>YOUR A *</pattern> 

<template>I think you mean "you're" or "you are" not "your. 

<srai>YOU ARE A <star/x/srai> 

</template> 

</category> 


Figure 3.27: Spell check in AIML 


During the conversation, each response will only be shown if the phrase or value in the 
pattern tag was exactly matched. Even with the AAA knowledge bases containing 
45,318 categories, ALICE experiences the same problem of keyword miss-spellings as 
ELIZA. The technique proposed here is a method for finding the best result rapidly. 
When AIML searches for a match to a specific word but cannot find one, the next best 
match is always a wildcard. Instead of matching an unknown word to an unknown 
group, the word should also be spell checked and possible alternatives checked for a 
match with a higher priority. Here, a spell check engine is used to overcome this 
drawback of AIML powered by GNU ispell developed at MIT-AI lab 65 (see details in 
3.5.4.1) 

3.7.3 Implementation 

The original version of ALICE is known as Program dB [132], implemented in full 
AIML algorithm which can be quite complex because of the handcrafted code. Despite 
the similarities between the AIML and AINI algorithms in the Layer 4 (the PMCBR 
algorithm), AINI has an advantage in that it is possible to be implemented using a single 
SQL query using MySQL and Perl based on open source LAMP. Although AIML 
software has been enhanced and implemented into many different interpreters [214], 
such as the Java-based Program D AIML interpreter, Common Lisp (Program Z), 
Pascal (Program P), Python (Program Y), SETL (Program Z), and C++ (J-Alice), in 
general, AIML interpreters can be relatively large, complex programs. 

65 http://www.mit.edii/afs/sipb/proiect/sipb-athena/src/ispell/ 
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3.7.4 Supervised Learning 


ELIZA and AIML have implemented the same ways of generating random responses. 
Instead of having a single response matching a particular pattern, the AIML template 
contains a set of possible responses as shown in Ligure 3.28. 

<category> 

<pattern>ARE YOU A BOY</pattern> 

<template> 

Yes I am a boy. 

</template> 

</category> 

<category> 

<pattern>ARE YOU A *</pattern> 

<template> 

Am I a <star/>? 

<random> 

<li>Yes.</li> 

<li>No.</li> 

<li>Maybe</li> 

<li>I don't know.</li> 

<li>Sometimes.</li> 

<li>Of course I am.</li> 

<li>Only when it matters.</li> 

</random> 

</template> 

</category> 


Figure 3.28: Random Responses Categories in AIML 


Based on this category ALICE is defined as a “boy”. But if one user asks a very similar 
question, like “Are you a little boy?, the answer from the ALICE is quite different, such 
as “Am I a little boy? No!” This contradicting answer is generated by the next random 
category in the AIML, as there is no matching category for the input text “Are you a 
little boy”. 


The method above may be called words-puzzle and clearly shows that the CA doesn’t 
understand the input text at all in our human sense and gives a stupid answer according 
to its output-template. In addition, this mechanism also proves that the CA’s knowledge 
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bases lack a grasp of English grammar and can even be misleading. This is an important 
reason for users losing interest in chatting with a CA[220]. 

Nevertheless, in AINI dynamic responses are used instead of random responses (see 
section 3.7.5) and these were implemented closely with a supervised learning module. 
This enables AINI to collect unanswered questions from a separate database to be 
submitted and administered by domain experts in the Learning Module. The random 
responses are not implemented based on AIML interpreter, but by using Perl interpreter. 
For instance, if AINI’s answer is not available from the relational database, the response 
will pick up from the Perl array response template separated by a delimiter, “I”. The 
extra step involves splitting the string into three separate sentences, and randomly 
selecting one of them. This method not only makes the dynamic response more varied, 
but also sends the unanswered questions of the past conversations to the learning 
module and adds them into AINI’s knowledge base. 

Finally, the newly generated matching rules subsequently are stored and upgraded by a 
domain expert. Therefore in the future, AINI’s knowledge bases will not only grow, but 
will be also able to answers future potential questions. 

3.7.5 Dynamic Responses 

The original AIML has some problems that must be resolved, namely its depth-first 
search function raised by [46, 47]. There are some simple ways to make the AIML 
search work very quickly. Standard AIML uses depth-first search, which does not 
optimise the result, as the name implies. It finds the first available solution by searching 
through a tree of possibilities. 

In the proposed approach, the PMCBR algorithm uses the relational database to store a 
binary prefix tree of ah the pattern strings listed in the AIML files. A recursive function 
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written in Perl searches the tree for a string that matches the input query. Once found, it 
dynamically retrieves the corresponding template from a second table. The template is 
stored as a single string containing AIML tags, which the Perl code must then parse. A 
relational database is used as a rapid way of accessing the information usually stored in 
AIML files. With enhanced version of AIML, the output sentence can be dynamically 
constructed using parts of the input sentence; therefore the relational database does not 
have to store every possible response. 

3.8 Summary 

This chapter has discussed many existing technologies that could be combined into a 
CA framework for the Web, mobile services and CMC applications; these technologies 
are already in existence, but not as an integrated entity. From the experiment and proxy 
conversation, it can be anticipated that AINI’s framework could play an important role 
in popularising the concept of intelligent CAs, not only on the web but also on mobile 
services or other network applications. This research paves the way for more humanoid 
user interfaces based on human language technologies. 

It was also found that multilevel NL-Query models, DKMM, and modified A-tiered 
architecture address the extensive variability that is encountered in today’s CA 
frameworks. Over-generalised interfaces make the framework hard to understand and 
use. There is a delicate balance between flexibility and simplicity. Performance cannot 
be compromised for the sake of flexibility and the lowest common denominator solution 
is often unacceptable. It is necessary to have flexible development environments, tools, 
and regression tests. Application interoperability is an essential direction for future 
improvement. 
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The AINI’s framework has demonstrated that it could be deployed within any existing 
website to achieve intuitive user interaction for that website. The aim is to provide a 
framework that is easy to use and convenient for the developers. Personality and 
personalisation could be used within a CA’s framework with no need for redesign and 
remodeling. The next chapter covers another important aspect on the assessment of the 
CA’s knowledge bases extract from the Web. It addresses the issue of trustworthiness of 
knowledge bases and information extraction for CAs. 
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CHAPTER 4 


AN ASSESSMENT OF THE TRUSTWORTHINESS OF 
KNOWLEDGE BASES FOR CONVERSATION AGENTS 

4.1 Introduction 

This chapter explores the study of agent knowledge in the data server tier of the AINI’s 
framework and particularly, its relationship to the related study of information 
trustworthiness and knowledge extraction. Some important methods for knowledge 
extraction from online documents are also discussed in this chapter. This is done in 
relation to the purpose of the design and development of domain-specific information 
for CAs’ knowledge bases. This chapter focuses on a novel approach based on the 
proposed Web Knowledge Trust Model (WKTM) [173, 175, 190, 221-223] and the 
Automated Knowledge Extraction Agent (AKEA) [111, 112, 156]. The work of these 
proposals has been reported as research finding from this study and they are discussed 
in Sections 4.5 and 4.6. 

To reiterate the purpose of this thesis, it aims to present a methodology to facilitate the 
use of online documents from the World Wide Web (WWW) and to use the acquired 
knowledge to develop the intelligent CA, AINI. Information extracted from public web 
pages could be problematic. There are issues that web pages may contain incorrect 
information or are outright hoaxes. Therefore, a WKTM is proposed in this study to 
determine ‘trustworthy’ websites, and, to ensure the credibility and reliability of the 
knowledge extracted from the web-derived corpora. 

This study has applied WKTM to extract pandemic Bird Flu domain knowledge from 
the web using AKEA. WKTM is extensible through polymorphic inheritance of 
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components, and is based on a modular design. This chapter also provides an integrative 
review of the literature on the subject of websites’ information trustworthiness. In 
addition, the WKTM applies both quantitative and qualitative approaches to determine 
the trustworthy websites. Within the context of this research, the term authority implies 
‘reputation’, ’empowerment’ and ‘credentials’, and the term credibility implies 
‘reliability’, ‘truthfulness’, and ‘unbiased’ as characteristics of the selected websites. 

At present, the WWW provides a distributed hypermedia interface to a vast amount of 
information. For instance, Google [224] currently has a training corpus of more than 
one trillion words (1,024,908,267,229) from public web pages. While the Web provides 
a huge source of information and data, commercial search engines however are not the 
best way to gather answers to queries due to the overwhelming number of results 
returned from a search. Nevertheless, despite certain obvious drawbacks such as the 
lack of control, there is no doubt that the WWW is a source of data of unprecedented 
richness and accessibility [225]. 

As reported in previous papers [113, 173] and discussed in Chapter 3, ATNI’s 
conversation agent operation is based on open-domain and domain-specific knowledge 
bases. Domain-specific knowledge bases consist of Natural Language Corpora and 
answers for Frequently Asked Questions (FAQ). Both components have been extracted 
from online documents using an Automated Knowledge Extraction Agent (AKEA) 
[156]. The AINI software agent was programmed to provide up-to-date information and 
to deliver essential information from trusted sources. The goal is that AINI will be 
capable of interacting with its users naturally and to provide reliable information. 
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The aim of this chapter is to develop intelligent agent techniques that will help to 
acquire information from websites that are reputable, credible, reliable and accountable. 
The WKTM was developed with the specific aims as follows: 

• to determine, through corpus analysis, the effectiveness of creating a CA’s 
knowledge base with an unbiased corpus. The evaluation was based on data 
extracted from freely available online documents from the World Wide Web. 

• to understand how WKTM can improve the selection of ‘trustworthy’ websites 
and most importantly, how this model can be applied to other domains. 

4.2 Trust and Methodology 

There are 936 million pages 66 or between thirty to fifty thousand health-related sites 
available on the Internet. It has also been estimated that over 21 million users have been 
flooded by health information provided therein [226]. The Rhode Island HEALTH Web 
Query System [227] for example shown an ever-increasing public expectation and they 
are seeking for reliable and trustworthy online health information. The Rhode Island 
system also provides health professionals and community organisations with valuable 
information. However, less than half of the medical information available online has 
been reviewed by doctors [226]. There have been numerous detailed assessments of the 
quality of health information on the web. Eysenbach G. et al. [228] carried out a 
systematic review of health website evaluations and noted that the most frequently used 
quality criteria included accuracy, completeness, readability and the design of the site. 
There is a real need to explore and understand the ways in which the trustworthiness of 
how online health information are assessed. Unfortunately, few studies and little 
empirical evidence for a trust model in health-related websites exist [229, 230]. 

66 Received 936 million pages hits by query on the Google search engine with the keyword “health” on 6 July 
2007, http://www. google.com/search?source=ig&hl=en&q=health 
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Table 4.1: Comparison of Trust Types and Measures in Previous Research 67 


Study 

Type of Trust 

Aim of the Study 

Experiment setup 

Object of 
trust 

McAllister 
(1995) [231] 

Cognition-based trust: trust and respect 
people who are trustworthy because of good 
reasons and evidence. 

This study addresses the relationships of 
interpersonal trust among managers and 
professionals in an organisation. 

Qualitative study: 
Empirical study 

Interpersonal 
trust in 

organisational 

settings 

Parkhe(1998) 

[232] 

Institutional-based trust: based upon 
intermediary mechanisms and by providing 
implicit guarantees 

This research shows how partners can 
proactively manage an alliance relationship in 
order to develop trust. 

Qualitative study: 
Empirical study 

Inter- 

organisational 

trust 

de Ruyter et 
al. (2001) 

[233] 

e-services -based trust: exploring the 
antecedents of trust, relative advantage and 
perceived risk in the adoption of e-service 

This research aims to investigate the impact of 
organisational reputation, relative advantage, 
and perceived risk on perceived service 
quality, trust and behavioural intentions of 
customers towards adopting e-services. 

Qualitative study: 
Experimental study, 
participants were 
presented with offline 
role-playing scenarios 

Customer trust 

Kanawattanac 
hai & Yoo 
(2002) [234] 

Affect-based trust: involves the emotional 
elements and social skills of trustees. 

This research examines the differences of 
virtual teams in the changing patterns of trust 
over time. 

Qualitative study: 
Empirical study 

Interpersonal 
trust in virtual 

teams 

Kim & 
Prabhakar 
(2002) [235] 

e-banking-based trust: exploring initial 
trust in the adoption of online banking 

This research hypothesized the consumers’ 
propensity to tmst, structural assurances to be 
antecedents of consumers’ initial trust in the 
electronic channel as banking medium. 

Qualitative study: Online 

survey 

Online 

banking 

consumers 

trust 

Gefen et al. 
(2003) [236] 

e-commerce-based trust: exploring trust in 
an e-commerce vendor, using online travel 
agency 

This research highlights that online consumer 
trust is as important as the Technology 
Acceptance Model (TAM) use-antecedents 

Qualitative study: 
Experiential survey 
approach, participants 
performed product search 
at an online bookstore 

Consumers 
trust online 

stores 


These empirical studies were collected from proceedings, journal papers and articles on the Internet. Some of the finding also available from Grabner-Krauter & Kaluscha [236]. 
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Study 

Type of Trust 

Aim of the Study 

Experiment setup 

Object of 
trust 

Grabner- 
Krauter & 
Kaluscha 
(2003)[237] 

e-commerce-based trust: facilitating a 
multi-level and multi-dimensional analysis 
of research problems related to trust in e- 
commerce. 

This research provides an integrative review 
of the empirical literature on 
trust in e-commerce in order to allow 
cumulative analysis of results. 

Qualitative study: Online 

survey 

Institutional 
phenomena 
(system trust) 
and personal 
and 

interpersonal 
forms of trust. 

Paul & 
McDaniel Jr. 
(2004) [238] 

Relational-based trust: is the extent one 
feels a personal attachment to the other 
party and believes that other party will 
intend doing good to him/her. 

This research examines the relationship 
between interpersonal trust and virtual 
collaborative relationship performance. 

Qualitative study: Field 
study 

Interpersonal 
trust in virtual 
organisational 
settings 

Leimeister et 
al. (2005) 

[239] 

System-based trust: this type of trust is 
based on the perceived property of or 
reliance on a system or institution. 

This study describes how trust enabling 
functionalities can be systematically designed 
and implemented in a virtual community 

Qualitative study: survey 

Interpersonal 
and system 
trust 

in virtual 
community 

Ratnasingam 
(2005) [240] 

Technological -based trust: confidentiality 
mechanisms, authentication mechanisms, 
and access controls mechanisms. 

This research examines how institutional 
structures lead to technology trust for online 
B2B e-commerce. 

Qualitative study: 
interview 

Inter- 

organizational 

trust 

Hsu et al. 
(2007) [241] 

Information-based trust: exploring the 
nature of trust and divides it into three 
constructs—economy-based, 
Information-based and identification-based 

trust. 

This research identifies the antecedents that 
support or hinder an individual’s knowledge 
sharing behaviour by applying Social 

Cognitive Theory-based (SCT) model from 
both social environment and personal 
cognition aspects. 

Qualitative study: Web- 

based survey 

Virtual 
Communities 
(VC) trust 
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Table 4.2: Website Trust Type and Methodology in Previous and Current Research 


Study 

Type of trust websites 

Aim of the Study 

Theoretical 

Methodology 

Experiment 

Object of 
trust 

Koufaris & 
Hampton-Sosa 
(2002) [242] 

e-vendor: exploring the antecedents 
of initial trust in an online 
company, using several e-vendors 

This research provides customers 
with enjoyment and perceived 
control, which leads to greater trust 
in the company itself through the 
customers' perceptions about the 
web site's usefulness and ease of 

use. 

Technology 
Acceptance 
Model (TAM) 

Qualitative study: 
Experiential survey with 
on-line questionnaire, 
participants visited an 
Unfamiliar Web-site and 
performed a product 
search 

Company 

websites 

Fox & Rainie 
(2002 [229] 

e-health: verifying online 
information is that health seekers 
trust the online environment. 

This research is identifying a 
trusted online source for credible, 
comprehensive, and clinical 
healthcare information, and secure, 
confidential communications 

- 

Qualitative study: 
telephone interviews 

Health 

websites 

Fogg, Marable 
et al. (2002) 
[243] 

e-commerce, finance, health, news, 
search engines, sport, travel: 

combine assessments of both 
trustworthiness and expertise to arrive at 
a final credibility perception. 

This research aims to investigate 
what causes people to believe - or 
not believe - what they find online 

Web 

Credibility 

Qualitative study: 
comments of the websites 

e-commerce, 
entertainment, 
finance, 
health, news, 
search engines, 
sports, and 
travel 

Websites 

Corritorea et 
al. (2003) 

[2441 

e-commerce: identifies three 
perceptual factors that impact on-line 
trust including perception of 
credibility, ease of use and risk. 

This research proposes an online 
trust model, specifically trust 
between people and informational 
or transactional websites. 

Online Trust 
Model 

Qualitative study: 
interaction between users 
and websites 

e-commerce 

Websites 
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Study 

Type of trust websites 

Aim of the Study 


Methodology 

Object of 
trust 




Theoretical 

Experiment 


Sillence et al. 
(2004) [230] 

e-health: identifying a design and 
information content factors influence 
trust and mistrust of online health 
sites. 

This research aims to fill the gap in 
our knowledge of the ways in 
which real-world consumers 
evaluate information and online 
health advice (menopause). 

Staged Trust 
Model 

Qualitative study: 
observation and 
discussion 

Health 

websites 

Singer (2004) 
[245] 

e-business: producing a more 
advanced and trust-based user 
interface and experience. 

This research collaboration 
between MIT's Sloan Center for 
eBusiness and Intel aims to create 
a "trusted advisor" techniques and 
an online persona to help 
customers with a specific task on 
the Intel’s Web-site. 

WebTrust 

Methodology 

Qualitative study:: user 
click streams and user 
surveys 

Intel’s website 

Goh & Fung 
[190, 222] 

e-health: facilitating the use of 
trustworthy online documents 
relating to pandemic Bird Flu from 
the World Wide Web to create 
knowledge bases for intelligent 
conversation agents’ (orbots’j 

This research proposes a Web 
Knowledge Trust Model to find 
‘trustworthy’ websites and to 
ensure credibility and reliability of 
knowledge extraction based on 
web-derived corpora. 

Web 

Knowledge 
Trust Model 
(WKTM) 

Qualitative study: 
Computational linguistic 

Quantitative study: 

PageRank and survey 

Pandemic 

websites 
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Researchers in this discipline are also tackling the question on how to evaluate the quality 
and trustworthiness of online resources [243, 246-250]. Table 4.1 and Table 4.2 summarised 
an overview of the empirical studies on trust and the methodology. Pew Internet and 
American Life Project’s Report [229] found that about a third of the Pew respondents felt the 
need to check the accuracy and reliability of the information they read. 

As website and Internet technologies become more established, attention is turning to the 
factors that impact upon the success of websites. The key among these factors is “trust”. 
Many studies have recognised that trust is a multidimensional construct and the studies have 
also examined different types of trust. Most of the studies were conducted in organisational 
settings or in the regime electronic commerce [231, 232, 236-238, 240, 244, 251]. There are 
different approaches to the study of trust across domains such as banking [235], management 
[231, 232, 234, 238], business [233, 242, 245], information and technology [239-241]; and 
others. However, although trust has been studied in a variety of disciplines, each of these 
disciplines has produced its own concepts, models, methodologies, definitions and findings. 
In fact, even within a given field, there is often a lack of agreement and focus of effort. 
Despite the nature or approaches in research on the subject of trust, researchers from every 
discipline do acknowledge the value of trust. Moreover, many scholars in HCI have begun 
to study trust in online information. The aims of such studies have ranged from looking at 
issues on design and interface elements, perceived website credibility [252, 253], and, the 
extent to which the technology is perceived and responded. 

From a methodological viewpoint, this proposed research is based on a combination of 
qualitative and quantitative approaches. In the qualitative approach, an empirical survey was 
carried out by collecting comments from experts on web credibility, and on the other hand, 
the quantitative approach is based on log likelihood and Google’s PageRank results from the 
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corpora collected from the web. According to literature [254, 255] a combination of 
methodologies used in research can potentially explore more advanced facets of online trust. 
Based on the result, both qualitative and quantitative research methods have made important 
contributions to the results in this research. 

4.3 Websites as Objects of Trust 

There are many related concepts that are often confused with trust. Computer Mediated 
Communication (CMC) researchers study individual-to-individual trust relationships 
mediated through technology [256]. In contrast, other researchers focus on technology as the 
object of trust [257], Researchers in the field of intelligent agents for example have looked at 
trust between software agents, a relationship in which agents can be objects of trust, and [258, 
259] [11] have examined how people treat the intelligent agent as real people, and by 
extension, as objects of trust. They found that people do enter into “ relationships ” with 
computers, websites, companies, people and other new media. In this research, websites are 
defined as objects of trust in the WKTM. The term ‘website’ can be used to include the 
underlying information from online documents, the interactive user experience with the 
website, and/or the ownership of the website. In order to limit the scope of our study, this 
chapter will focus on the pandemic bird flu health related websites as the object of trust. 

4.4 Trust Model 

There are few examples of trust model available for e-commerce, e-business, e-banking, e- 
health, etc as detailed in Table 4.2. Researchers at MIT's Sloan Center for e-Business and 
Intel, for instance, are trying to improve e-commerce trust with a new online system called 
WebTrust methodology [245]. The WebTrust methodology has enabled Intel to generate 
superior trust between Intel’s Web site and customers. Koufaris M. and Hampton-Sosa W. 
[242] hypothesised the variables are perceived usefulness and perceived ease of use of the 
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website based on Technology Acceptance Model (TAM) developed by Davis [260]. These 
two factors could be considered as the predictors for consumers’ trust in the online company 
when they have their first contact with the Website. Cynthia, et al. [244], in proposing Online 
Trust Model, focused specifically upon trust between people and i nf ormational or 
transactional websites. In this model, they identified three perceptual factors that impact 
online trust including perception of credibility, ease of use and risk. The Staged Trust Model 
developed by Sillence et al. [230] focused their research in the field of online health advice, 
specifically for users seeking information on menopause. According to Sillence et. al., the 
information content on the website is the main factor influencing the trust and mistrust of 
online health sites. 

Taking a closer look at the methodology and objects of trust used by other scholars, it was 
revealed that six of the sixteen studies (as shown in Table 4.2) focused on websites as objects 
of trust, and only three of those six examined health-related websites [229, 230, 243]. From 
these studies, two were conducted using the same research models (Technology Acceptance 
Model) [236, 242], but none has focused on how to select and extract online trustworthy 
websites for CAs. The following section describes in details the proposed WKTM approach. 

4.5 Web Knowledge Trust Model (WKTM) 

The objective of the WKTM is to provide solutions that will empower developers to adhere 
to the procedure described in Figure 4.1. It is expected that the model is also applicable to 
other application domains. The procedure outlined below is set out to address the question of 
“how to select the trustworthy domain knowledge from existing online web documents?” The 
WKTM procedure can be divided into five stages. First, the target of the web domain 
knowledge to be extracted is determined. For this study, pandemic Bird Flu is the focus of 
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the domain knowledge. In the second stage, a number of seeds are used in an iterative 


algorithm to bootstrap the corpora using unigram terms from the web. 



c r r a 



Figure 4.1: Web Knowledge Trust Model (WKTM) 


The process then proceeds to the third stage to extract bigram terms based on the final 
corpus and unigram terms extracted in the previous phase. Once the sets of domain URLs 
have been collected, they are then submitted as queries to the search engine via Google API 
(Application Program Interface) 68 . All the downloaded URLs will be used to build a final 
domain corpus. In the fourth stage, the corpus obtained are evaluated using Log Likelihood, 


68 


http://www. google.com/apis 
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Google’s PageRank algorithm[261] and Stanford’s Web Credibility criteria [243]. Finally, 
the top five most trustworthy websites will be selected and information will be extracted 
using AREA. 

4.5.1 Selecting Domain-Specific Web Knowledge 

There are several sources from which web knowledge can be extracted; either from the well- 
edited and carefully balanced corpus such as the British National Corpus (BNC), or noisy but 
useful web documents which are largely unexplored[262]. The Google search engine claims 
to have indexed more then 14 billion pages on the web to date. Various studies [263], [264], 
[265] have indicated that, historically 69 , the web has doubled in size every nine to twelve 
months. Web pages are changing rapidly. In the period between 1 st September 2006 and 20 th 
October 2006, 3.36 billion pages were indexed by Google. This equates to about 67 million 
web pages added daily. With this huge volume of information, it is relatively “easy” to find a 
list of pages containing any given query terms. The difficult part is then to select, from the 
myriad of possible matching pages, the “top” 10 or 20 according to some computable quality 
measure which, ideally, closely resemble the user’s notion of relevance. It is obvious that the 
ability of any search engine to closely match this human notion has a major impact on its 
success. 

In this experiment, the Bird Flu pandemic is the focus of the domain knowledge base. In 
current times, pandemic flu has become an important research topic in order to meet the real- 
world challenge which may have severe global consequences. The Head of philanthropy at 
Google, Larry Brilliant, has also described his vision on how information technology can be 
used to fight pandemics [266]. However, as the Web becomes increasingly chaotic and has 
strong possibility of misleading and inaccurate health information, the Web could become 

69 

Previous statistic of the Google Indexed pages can be access at http://web.archive.org/web/*/http://www.google,com . 
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harmful to the unwary users. Selection of trustworthy web pages is therefore an important 
factor in ensuring the long-term viability of the Web as a useful global information 
repository. The detailed descriptions of the subsequent stages in the WKTM are now given in 
the following sections. 

4.5.2 Seeding 

The purpose of this stage is to select the corpus as a data acquisition resource for building the 
CA’s knowledge bases. The objective of this stage is to create a “balanced” corpus of Web 
pages which contains relevant key words and documents of a given domain. For the purpose 
of seeding, we use words from the general training corpus, British National Corpus, (BNC) . 
The BNC corpus consists of a collection of 100 million words in the forms of written and 
spoken language from a wide range of sources. It is designed to represent a wide cross- 
section of British English from the later part of the 20 th century in both spoken and written 
forms. Since this research focuses on the Bird Flu pandemic, the initial seeds should come 
from its generic term derived from “bird” and “flu”. From these seeds, we made a query to 
the online “specialised terminology” lists from the health information website 
MedFinePlus 71 Medical Dictionary. The term “bird flu” was found to be related to “avian 
influenza”. With these four seed words, a query was sent to the BNC online corpus and the 
word “virus” was obtained as an additional seed. From the bigrams observation, the seed 
“virus” occurred 19 times in “flu virus” and 11 times in “influenza virus”. Finally, we 
collected the five terms: “bird”, “flu”, “avian”, “influenza” and “virus” for use as initial 
seeds for the investigation. 

Once the seeds have been obtained, a comparison is made between the BNC corpus and 
Google’s large-scale corpus from public Web pages. The purpose of the comparison is to 


70 
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determine whether the BNC corpus is covering similar terms or updated information as in the 


web. A comparison of the results from the two sources is shown in Table 4.3. 


Table 4.3: Comparing number of hit results from BNC and Google’s Corpora using the 

set of unigram and bigram seed words 


SEEDS 

BNC 

Google 

Uni gram 

Freq of BNC 
Counts 

% 

Freq of Web counts 
in ‘000s 

% 

bird 

3869 

63.14% 

14,400 

33.13% 

flu 

573 

9.35% 

4,790 

11.02% 

avian 

45 

0.73% 

1,360 

3.13% 

influenza 

145 

2.37% 

2,120 

4.88% 

virus 

1496 

24.41% 

20,800 

47.85% 

Bigram j 

bird flu 

1 

3.23% 

602 

46.45% 

avian influenza 

0 

0.00% 

180 

13.89% 

flu virus 

19 

61.29% 

206 

15.90% 

influenza virus 

11 

35.48% 

308 

23.77% 


In Table 4.3, the Freq of count is the number of returns from searching BNC corpus and 
Google. As expected, the counts are much larger from Google than from the BNC. As shown 
in Table 4.3, the frequency of the total web counts from Google is 7,093 times larger than the 
BNC counts in the case of the unigrams. As for the bigrams, the Google Web counts are 
41,806 times larger. These data were collected on 12 th December, 2007. This evaluation 
demonstrates that BNC is small in terms of the number of frequency counts due to a smaller 
corpus as compared to Google. In addition, it can also be observed that the distribution of the 
seeds in the unigrams and bigrams are not similar. For instance, “avian influenza” as a 
scientific term for “bird flu” is not included in the BNC; whereas in the Google corpus, this 
term accounts for 13.89% of the returns from the seed queries. In addition, the colloquial 
term “bird flu” only occurred at a frequency of 3.23% in the BNC whereas in the Google 
corpus, the same term occupied almost 50% of the returns. From this exercise, it can be 
assumed that Google takes into account of the continual increase in the page volumes and 
scale-up its corpus accordingly. On the other hand, BNC has not been able to keep up with 
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newer terms such as “avian influenza” as indicated in Table 4.3. This also proves that BNC 
is insufficient by itself to provide the most updated information on any domain as in this case. 
However, as an initial stage in establishing the seeds for further query, the BNC has its merit 
as a training corpus. On the other hand, the Google returned over 600 thousands of web 
counts in the case of the seed word “bird flu”. This again makes any attempt to extract all the 
relevant knowledge from all these pages impossible. This therefore leads to the need to 
establish a more refined corpus and in particular, to acquire knowledge from trustworthy 
sites. The process is described in the following section. 

4.5.3 Building a Corpus 

In this stage, a domain-specific corpus on pandemic Bird Flu is built using the “ crawling ” 
approach. According to Broder et al. [267], crawling typically starts from a set of “seeds”. In 
this case, the seeds are obtained from the previous stage and consist of the five terms “bird”, 
“flu”, “avian”, “influenza” and “virus”. The crawling process consists of (a) fetch a page, 
(b) parse the page to extract all linked URLs, (c) for all the URLs not fetched previously, 
repeat steps (a) to (c). 

Normally, the crawling action will stop at some maximum value as limited by the Google 
API. For free service, Google limits the maximum number of queries to 1,000 per user per 
day. In this research, the number has been set as 10 URLs per search. The Google API is 
used to analyse the result rankings for several queries of different categories using statistical 
tools in the BootCAT Toolkit [268]. The corpora are essential resources for knowledge 
professionals who routinely work with specialised domain knowledge. BootCAT toolkit 
implements an iterative procedure to bootstrap specialised corpora and terms from the web 
and the process requires a list of “seeds” as input. The set of seeds are randomly combined, 
and each combination is used as a Google query string. These are the seeds which are 
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expected to represent the domain under investigation. A first query was sent to the Google 
search engine via Google API to extract the first corpus, and then new seeds were extracted 
from this corpus to build the final corpus [267]. 

Several important search parameters have to be controlled, such as the number of queries to 
be issued for each of the iteration, the number of seeds combined to build a query, and the 
number of pages to be retrieved for each query, and so forth. The first step of this phase is to 
extract a list of single- and two-word connectors from the corpus known as unigrams and 
bigrams respectively. During this phase, an additional seed called “H5N1” was found and it 
was frequently connected with other seeds in the final corpus. Hence, “H5N1” was added as 
the sixth seed to the seed set. 

The next step is to retrieve the final URLs to build the final corpus. For simplicity and to 
avoid bias, only HTML and English pages are included. For each of the six seeds, BootCAT 
sends a query to obtain the number of URLs related to the seeds. In this process, the number 
of the final URLs returned is 1500 pages. After discarding the duplicated and broken URLs, 
the URL’s related to the domain under investigation is 1428. 

A link analysis is applied to these sites under each domain name. If two domain names are 
linked with inbound and outbound connections, they are considered to be in a neighbourhood. 
Only the domains which are included in the neighbourhood are then selected. A few pages 
from each domain are then randomly chosen and concatenated into a document. After post¬ 
crawl cleaning, a corpus of 2,641,660 tokens is determined. This becomes the “Pandemic 
Corpus” in this research. 
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□ Pandemic Corpus (%) 

□ Google Large-scale Corpus (%) 


virus Influenza 

4.95 3.47 

20.44 8.05 


Seeds 


Figure 4.2: Comparing distribution of seed words between the smaller set data 
Pandemic Corpus with the Google Large-scale Corpus 


In order to verify the usability of this established corpus, the distribution of returns was 
compared with respect to the larger Google corpus. This is shown in Figure 4.2. Although 
this corpus was created using a smaller set of seeds, it has a similar distribution as Google as 
seen from the figure. Hence it suggests that the unbiased method as described in this 
proposal yields a similar coverage proportion as Google. This leads to the next stage of 
evaluating the selected corpus and towards establishing the trusted and reliable domain 
knowledge bases. 


4.5.4 Evaluating a Corpus 

Before one attempts to carry out an evaluation, it is necessary to define the term 
‘trustworthiness’ associated with websites based on the credibility reports by [243] and [253]. 
Trustworthiness, a key element in the credibility calculus, is defined by the terms ‘reliable’, 
‘truthful’, ‘unbiased’, and so on. Authority, another dimension of trustworthiness, is defined 
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by terms such as ‘authorised’, ‘reputable’, ‘accredited’, ‘credentialed’ and ‘empowered’. The 
word “authority” often indicates a government or an educational institution controlling the 
contents of a site. The authority dimension of trustworthiness associates with reputable 
organisations. Combining these two dimensions, this suggests that highly trustworthy 
websites will be perceived to have high levels of credibility [243], [252], [269], [270], [271], 
and authority. Based on these premises, this research is aimed at selecting the specific 
elements of a website that would lead to its consideration as a ‘trustworthy’ website. The 
elements proposed are based on Log likelihood ratio, PageRank and Web Credibility. They 
are described as follows. 

4.5.4.1 Log Likelihood Ratio 

In order to verify that the smaller pandemic corpus extracted by the proposed model is 
compatible to the large Google Corpus, the Log likelihood (LL) ratio is used as a quantitative 
assessment. The LL ratio approach is a statistical method in which a ratio is used to illustrate 
the coverage probability and accuracy within the confidence interval for two corpora. The 
higher LL ratio value indicates similar coverage probability even with small sample sizes 
[272] [273] [274], 

The method is fairly simple and straightforward to apply. Tokens in a collocation tend to 
occur together more often than one would expect by chance. Statistical measures of 
association can be performed to determine the likelihood of the tokens in an N-gram 
occurring together more often than average. For example, if “bird” is continually followed by 
“flu”, the tokens “bird” and “flu” was defined closely associated with each other and they 
occur together more often than random. The statistical measures are computed using the 
various co-occurrence and individual frequency counts of an N-gram. For each word in the 
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two frequency lists, the LL statistics were calculated and constructed as a contingency table 


in Table 4.4: 


Table 4.4: Contingency table for word frequencies 



Corpus 1 

Corpus 2 

Total I 

Frequency of word 

a 

b 

a+b 

Frequency of other words 

c-a 

d-b 

c+d-a-b 

Total 

c 

d 

c+d 


In the above table, the values ‘a’ and ‘b’ are called the observed values (O) from the two 
corpora. The values ‘c’ and ‘d’ corresponds to the number of words in Corpus 1 and 2 
respectively. They are the two N values. The expected values (E) are calculated based on 
equation (4.1): 


E. = 


N, l > 

i 


(4.1) 


where Ni = c, N? = d, word Ei = c*(a+b) / (c+d) and E 2 = d*(a+b) / (c+d). 


The calculation for the expected values takes account of the size of the two corpora. Based 
on this expected value, the LL ratio was calculated using equation (4.2). 


- 2 In X = 2^ 0-, In 

i 

In this case where i = 2, the calculation will be based on: 


f O, A 


\ E ' ) 


(4.2) 


LL = 2*((a*log (a/Ei)) + (b*log (b/E 2 ))) 

The word frequency list is then sorted by the resulting LL values. This gives the effect of 
placing the largest LL value at the top of the list representing the word which has the most 
significant relative frequency difference between the two corpora. In this study, the bigrams- 
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based version of the LL measure in the Ngram Statistical Package (NSP) 72 is used. In Table 
4.5, the high LL-score values indicate the most important similarities between the two 
corpora for the coverage of the seed words. The results show that the proposed approach 
produces a confidence interval with a similar coverage probability and a high level of 
accuracy based on the seed words from the two corpora. 


Table 4.5. Log-likelihood Ratios for Pandemic Corpus vs Google large-scale Corpus 


Bigram 

Pandemic 

Corpus 

Google Large-scale 
Corpus in' 000s 

LL- Score 

bird flu 

12640 

27,100 

+106266.72 

avian influenza 

9223 

7,360 

+95698.31 

H5N1 virus 

2342 

5,080 

+19635.16 

Influenza virus 

1307 

11,300,000 

+ 7387.20 

Pandemic Influenza 

918 

4,450 

+ 6233.06 

Total Corpus 

2,641,660 

1,024,908,267 



4.5.4.2 PageRank 

Evaluating enormous amounts of websites manually is not an easy task. Another approach is 
to use the Google’s PageRank algorithm [261]. PageRank is a unique democratic process 
relies on the nature of the Web by using the web’s vast link structure as an indicator of an 
individual page's value. It is the core algorithm of the Google's search engine. The algorithm 
is a complex and automated method which makes human tampering with the PageRank 
results extremely difficult. It should be noted that Google does not sell placements within the 
results thereby maintaining the democratic and unbiased nature of the search results. In this 
research, PageRank is used as one of the criteria to evaluate the trustworthiness of the 
websites based on link analysis. A similar application of link analysis is the evaluation of the 
quality of an academic work by analysing the amount of citations. The number of backlinks 
to a given page gives some approximation of a page's importance or quality. PageRank 


7 NSP Package can be downloaded at http://search.cpan.org/~tpederse/Text-NSP-l.03/ 
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extends this idea by not considering the links from all pages as equal. The algorithm also 
normalises the final value to a range of 0 to 10. Page Rank is defined in the equation 4.3: 

Pi= (1 - d) + (7 tj / Cj)p j (4.3) 

i =i — 

Suppose n is the number of webpages. Let = 1 if page j points to page i, and zero 
otherwise, cj is the number of pages pointed to by page j (i.e., number of outlinks). The 
Google PageRanks p, is defined by the recursive relationship where the parameter cl is a 
damping factor which can be set between 0 and 1. In this study, the value of d is usually set 
to 0.85. Note that the PageRanks form a probability distribution over web pages, so the sum 
of all web pages' PageRanks will be one. PageRank or p, can be calculated using a simple 
iterative algorithm, and corresponds to the principal eigenvector of the normalised link 
matrix of the web. 



Figure 4.3: An Example of PageRank Corresponding to Web pages 
Solution: p,= (1.49, .78, 1.58, .15) 

Note that Page 4 in Figure 4.3 shows no incoming links, and hence gets the minimum 
PageRank of 0.15. Google PageRank uses a whole number of 1 as a base for each page, and 
a ’voting value’ of .85 damping factor of the PageRank value. (This being the result of 
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the .15 damping factor applied to all pages when determining the value of outbound links 


from a page). 


Based on the 1,428 URLs returned from stage 3, a query is sent to Google’s PageRank 
directory to determine their rankings using Google PageRank algorithm (see Appendix C). 
The algorithm will calculates the PageRank of a specified URL and returns an integer value 
from 0 to 10. Figure 4.4 shows the results of the top ten sites based on the PageRank scale 73 . 
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Figure 4,4: Top Ten PageRank Scale for the Bird Flu Domain 


The least important site is one with a PageRank of 1. The most referenced and supposedly 
important sites are those with a P,of between 6 and 10. 


The results of the PageRank extracted on 25 January 2007 and the results will be difference based on algorithm 4.3. 
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4.5.4.3 Web Credibility 


This section presents the credibility assessment of the top 10 websites related to this study 
assessed by a form of qualitative approach. After the PageRank results have been collected 
from the top 10 sites, a site is assigned with scores manually by experts based on the Web 
Credibility ranking criteria [243]. In this experiment, ten experts from the American 
Association of Webmasters 74 in the web design field were asked to assess the credibility of 
these sites based on their professional judgement. 


Participants were given the opportunity to leave brief comments about the credibility of a site 
randomly assigned to them. The rankings are based on the 18 types of comments as shown in 
Table 4.6. The participants’ assessment comments for each site’s credibility were analysed. 
The comments were coded into two categories: trust or distrust. Some of the comments 
coded in this category are as shown in Table 4.7. The percentages shown in Table 4.6 
represent how often a comment on that topic appeared in the entire set of comments. For 
instance, participants commented on the appearance and information design of the site more 
often than any other website feature, with these comments accounting for 39.9% and 37.6% 
(respectively) of the total of 168 comments. 


Table 4.6: Frequency of the Comment Topics for Website Credibility 


Comment Topics 

(addressing specific credibility issue) 

Percentage 
(of 168 comments) 

Design Look 

39.9% 

Information Design/Structure 

37.6% 

Information Accuracy 

29.2% 

Information Bias 

24.8% 

Information Usefulness 

21.6% 

Information Focus 

20.1% 

Information Clarity 

19.4% 

Name Recognition and Reputation 

17.1% 

Company Motive 

16.5% 

Advertising 

14.8% 

Writing Tone 

9.7% 

Identity of Site Operator 

8.4% 


74 http://www .aawebmasters. com 
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Site Functionality 

7.9% 

Past Experience with Site 

5.3% 

Performance on Test by User 

4.1% 

Readability 

3.9% 

Customer Service 

3.7% 

Affiliations 

2.5% 


The data shows that when participants evaluated the credibility of sites, they focused 
relatively more on issues of design (design look and information design) and trustworthiness 
(the accuracy, focus, usefulness, bias and clarity of information). Table 4.7 shows the sample 
of comments related to trustworthiness. As experts in web design, a majority of the 
participants ignored the issues that were not related to their expertise. Hence, most of the 
participants relied more heavily on other areas perceived to be more important, such as 
appearance and information design, accuracy and information dissemination [243]. 


Table 4.7: An Example of Comments Related to Trustworthiness of the Website 
Information 


Topic 

Trust 

(Positive Comments) 

Distrust 

(Negative Comments) 

Information Focus 

• This site was committed to news 
and the dissemination of 
information. 

• Well focused and well organized, 
which lends to more credibility 

• The site looks credible because of 
the extensiveness of information. 

• This site was credible, but too 
wide-ranging the information. 

• Information badly presented on 
the site and too much 
information in single page. 

• This site seems focused on drugs 
and little on the health 
information. 

Information Accuracy 

• This site provided useful and 
interesting knowledge or facts. 

• Most of the articles on this Web 
site seem to be headline news and 
can be heard elsewhere. 

• Less credible and the fact that it 
crap and just to take an 
attention from the users. 

• The information displayed on 
the website harder to verify. 

Information Bias 

• It is credible because the opinions 
contained therein are based on 
authority websites. 

• The World Health Organisation 
has a great reputation. I would 
trust the information found in this 
website. 

• This site looks professional but 
based upon personal opinion 
and unaware of up-to-date 
information. 

• This site is more commentary, 
and intolerant. The arguments 
are more debatable, and thus 
less “trusted. ” 

Information 

Usefulness 

• This site provided useful and 
interesting knowledge about 
health information. 

• I find this site useful and 
informative because it offers 
services to the community. 

• This site appears not very useful 
and more like other news 
websites. 

• I feel the advertising on the site 
were off-putting and reduced the 
credibility of the site 
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Information Clarity 


• This site spells out very clearly 
their mission and activities. 
Information presented on the site 
is easy to understand and straight 
to the point. 


• I found this site too much text 
and the information puzzling. 

• Nothing on the site but fill of 
drugs terminology. 


Participants thought that the visual appeal of the site was important; poor visual appeal did 
not encourage further exploration. Poor design gave a negative first impression and the 
name of the website could lead participants to mistrust the site and its author’s intentions. 
Participants were also influenced by the websites’ names - 17.1% of comments related to 
name recognition and reputation. A good name was specific and to the point, but was not 
patronising or too gimmicky. A poor name was not trustworthy and could lead to a rapid 
rejection of the site. Most individuals preferred sites that are run by reputable organisations, 
or had a medical or expert feel about them. They trusted the information on such websites 
especially when the credentials of the site and its authors were made explicit. Sites that 
indicated that the information originated from a government organisation (.gov) or education 
entity (.edu) were well received. 

Table 4.6 indicates that content or information factors were amongst important features in 
describing trusted or well-liked sites. Participants trusted the selected sites because they 
demonstrated an in-depth knowledge of a wide variety of relevant topics and put forward 
clear and unbiased i nf ormation. Participants were more likely to trust the information if they 
could verify it and cross check it with other websites. Participants placed the most trust in 
sites that provided informative content on a wide range of relevant topics. The information 
was trusted if it was unbiased, and if the information on such sites was supported by research 
articles or original sources. Sites that were selected contained a variety of content features 
including Frequently Asked Questions (FAQs) and a section on hints and tips. This result is 
supported by Stanford’s Web Credibility large study[243]. Their data suggests that 
participants evaluating the credibility of health websites pay relatively more attention to the 
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focus and usefulness of the information. What participants find in these areas apparently 
becomes a significant indicator of the site’s credibility. 

The Top 10’ sites collected from Google PageRank were then ranked according to their 
mean scores, highest to lowest. This ranking gives a general idea about which sites in this 
study have been found to be the most or the least credible by the users. When a more credible 
site was listed on the page, the site’s score was given a point and the less credible site lost a 
point. Over the course of the study, each site was evaluated many times, gaining and losing 
points along the way. At the end of the study, each site received a final score, which was the 
average (mean) of all the scores it had received from the experts. The average value is the 
total number of points divided by the total number of times the site was ranked. If a site has a 
score of +1.0, it means the site is deemed to be credible by all participants. If the score is 0.0, 
it means the site was considered to be credible half of the time. Combining the three methods 
described. Table 4.8 shows the results of the trustworthiness analysis for the top 10 sites 
related to the domain knowledge in this study. 


Table 4.8: Website Rankings based on Google’s PageRank and Stanford’s Web 

Credibility 


Website 

URL 

Web 

Credibility 

Average 

Score 

PageRank 

Score 

Total 

Score 

Final 

Ranking 

U.S. Department of 

Health & Human Services 

pandemicflu.gov— 

0.56 

9 

9.56 

1 

The White House 

whitehouse.gov ,f> 

0.42 

9 

9.42 

2 

World Health 

Organization 

who.int 77 

0.54 

8 

8.54 

3 

U.S. Centers for Disease 
Control and Prevention 

cdc.gov 78 

0.41 

8 

8.41 

4 

U.S. National Library of 
Medicine 

nlm.nih.gov , ' , 

0.46 

7 

7.46 

5 

University of Rochester 

urmc.rochester.edu 

0.37 

7 

7.37 

6 


75 http://www.pandemicflu.gov/ 

76 http://www.whitehouse.gov/infocus/Datidemicflu/ 

77 http://www.who.int/csr/disease/avian influenza/en/ 

78 http://www.cdc.gov/flu/avian/ 

19 http://www.nlm.nih.gov/medlineplus/birdflu.html 
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Medical Center 

80 





U.S. Department of State's 
Bureau of International 
Information Programs 

usinfo.state.gov 81 

0.03 

7 

7.03 

7 

BBC News 

news.bbc.co.uk 82 

-0.45 

7 

6.55 

8 

ABC News 

abcnews.go.com 83 

-0.54 

7 

6.46 

9 

U.S. News & World 

Report 

usnews.com 84 

-0.29 

6 

5.71 

10 


+0.50 = judged as more credible in 75% of pairings 

What Credibility Average Scores Mean: 0.0 = judged as more credible in 50% of pairings 

-0.50 = judged as less credible in 75% of pairings 


4.5.5 Trustworthiness Websites 

The final set of URLs was further culled to include only selected sites attributed to regulated 
authorities. They are mainly government bodies, international organizations or educational 
institutions. All these organizations control and provide the contents of their respective sites. 
Once the seed set is determined, each URL’s page is further examined and rated as either 
reliable or reputable. As shown in Figure 4.5, the selection is reviewed, rated and tested for 
connectivity with the trusted seed pages. The expert participants in the web credibility 
assessment exercise preferred websites that contain a great deal of information, instead of 
publicity news from the media such as BBC News, ABC News and USNews. These results 
also showed that the content or information factors were more important than design features 
in describing trusted or well-liked sites. In the current study, the final five websites cluster at 
the top of the web trustworthiness rankings are: pandemicflu.gov, whitehouse.gov, who.int, 
cdc.gov and nhn.nih.gov. All these highly credible sites were selected based on PageRank 
and credibility scale scores. These five top sites are clearly viewed by the expert participants 
as more credible than the other five sites in this study. 


80 http://www.urmc.rochester.edu/pr/cuiTent research/bird flu/index.cfm 

81 http://usinfo.state.gov/birdflu/ 

82 http://news.bbc.co.uk/ 1/hi/in depth/world/2005/bird liu/default.stm 

83 http://abcnews.go.com/Health/AvianFlu/ 

84 http://www.usnews.com/usnews/health/bird-flu/ 
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Figure 4.5: Comparing Trustworthiness of Top 10 Websites related to the Bird Flu 

Domain 


The results support the proposal that the trustworthiness of websites is not only based on the 
PageRank and Web Credibility, but also the ‘authority’ of the websites which is not taken 
into account within the PageRank and Stanford Web Credibility criteria. There are other 
important factors in determining the ‘reliable authority’ of a site. They could be based on the 
site’s history and the number of back-links to government agencies, education institutions, 
and international organizations. The more established and relevantly linked a site has, the 
more likely it could be considered as ‘stronger’ or ‘more reliable’. This may effectively 
suggest the linked site has 'authority', ‘reputability’, ’empowerment’ and ’credentials’. This 
work will be examined in future study. Finally, the top five URLs are then used as the main 
source of knowledge for AREA to extract the pandemic related contents to build AINI’s 
domain-specific knowledge base. 
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4.6 Automated Knowledge Extraction Agent (AKEA) 

Automated Knowledge Extraction Agent (AKEA) was developed during the breakout of the 
Severe Acute Respiratory Syndrome, or SARS, epidemic in 2003. [Ill, 112, 117, 156]. As 
mentioned earlier, our objective is to deliver essential information from trusted sources. The 
AKEA module is intended to build a knowledge base for the CA automatically. The 
information will be extracted and mined from the existing top five selected trustworthy 
websites described in Section 4.5. Figure 4.6 shows a summary of AKEA’s architecture. The 
framework consists of seven modules developed for different applications [212, 275]. 



Figure 4.6: AKEA Architecture 


4.6.1 Crawler 

Given a selected trustworthy URL as a site of interest, the crawler will traverse the pages and 
follow the links in a breadth-first manner and will return all pages that meet requirements for 
further filtering. The functions of the crawler are like those used in conventional crawler- 
based search engines [261, 276]. The crawler resolves root domain selected from the trusted 
websites discussed in Section 4.5, ie. pandemicflu.gov, whitehouse.gov, who.int, cdc.gov, 
nlm.nih.gov, and follows subsequent links which are available on a page until a certain depth 
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as defined by the user and shown in Figure 4.7. These configurations are set in the crawler 
database. For every page crawled, a copy is returned for further processing by the wrapper. 
The activities of the crawler are logged in the crawler log database. 


administration m 



Figure 4.7: Crawler’s Configuration Interface 


4.6.2 Wrapper 

The wrapper will extract the title of the page, author, date, content and other metadata 
encoded in the page. One of the notable features of this framework is that it is open for the 
exploration of any possible XML-annotated ontological information. In other words, this 
module is extensible to include future semantic web facilities. All this information is useful 
as an additional semantic to describe the knowledge encoded in the pages and to expand the 
knowledge base. 
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4.6.3 Text Categoriser 

The text categoriser reads the content returned by the wrapper and, using Naive Bayes 
probabilistic classifiers [21], the categoriser will then assign category labels to these pages. 
The category information is stored in a buffer and it is important to conclude whether a page 
in the buffer should be processed any further. This acts as a secondary filter after the crawler 
to ensure that only the desired web documents are further processed for the CA’s knowledge 
base. 

4.6.4 Syntactic Preprocessor 

This stage reads the categorised text, tokenises them and identifies the dependencies among 
them. Based on the dependencies, grammatical relations, that is, phrasal categories such as 
noun phrases, verb phrases and prepositional phrases, are extracted based on the formal 
grammatical system called X-MINIPAR. X-MINIPAR is a modified version of MINIPAR 
[211] (see details in Section 5.2). The named entities in noun phrases are tagged with 
identifiers such as organisation and person which will be used later as predicates for the first- 
order logic. Finally, the morphological root of verb phrases is identified for later use. 

4.6.5 Semantic Parser 

At this stage, the parser will translate the syntactically-tagged text and the relevant metadata 
into semantic representation in the form of first-order logic terms using certain parse rules. 
Morphological roots of verb phrases will form the factor for relations and noun phrases will 
be the arguments in the relevant relations or properties. 
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4.6.6 Semantic Interpreter 


Using the definition of concept classes, relations, properties and inference rules, the 
interpreter tries to fit into the ontology by producing semantic instances explaining the 
meaning of the current text. These instances are then stored into the knowledge base as a 
semantic network. 


4.6.7 Query Engine 

The query engine allows the use of the domain-specific knowledge base to produce better 
retrieval or to perform inferences, and to a certain extent, to be used to assist in problem 
solving. There are three data structures that support the operation of the module as described 
below: 


• Metadata 

Metadata consists of information that can be inferred directly from the syntactic structure 
of HTML pages. This information is used to identify and define each page such as the title, 
author, content, data, keywords, category, etc. 

• Parse Rule 

The parse rule consists of rules stating how predicates and arguments are to be placed 
together to form logic representations. 

• Knowledge Base 

The knowledge base consists of three parts: semantic network, ontology and inference 
rules. The semantic network is in a form understandable and able to be manipulated by 
machines. The ontology is a conceptual schema that contains the definition of concepts of 
interest, relations between them and their properties. It can be seen as a template of classes 
where the instances of these classes will populate the knowledge base. As for the inference 
rules, they are used by the query engine to intelligently infer new and untold facts from 
existing ones. 
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O Structure of the Ontology 

t3 domain 
entity 
LJ date 
disease 
SO blood 
IBQ circulatory 
jSjtS parasitic 

1®] eukaryotic_pathogens 
j*] prions 

1®] prokaryotic_pathogens 
®] reiter 

Hi viral_pathogens 
BQ respiratory 
O location 
®D organization 
CJ person 
O variable 
HD event 


To view this ontology on different page, click here 


Figure 4.8: Ontology Structure in AKEA 


As discussed in Section 3.5.3 on DKMM, AINI’s domain-specific knowledge bases are made 
up of two unit knowledge domains called Natural Language Corpus and FAQ component. 
The unit domains in the Natural Language Corpus component consist of knowledge and 
information harvested from or expressed in ontologies (see Figure 4.8), gazetteers and named 
entities (see Figure 4.9). 
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Figure 4.9: Name Entity Recognition in AKEA 
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These have been implemented as domain-dependent modular components which will allow 
future improvements in the domain knowledge. The named entity module identifies named 
locations, named persons, named organisations, dates, times and key measures in text 
formats. The information is obtained by AKEA. In order to identify these entities, our system 
uses rules to specify the named entities’ structure in terms of text tokens and information 
from the source such as tagger, morphosyntactic analyser and knowledge bases of names, 
clue words and abbreviations. Figure 4.9 shows an example of the name entity recognition 
for the sentence “ What are the concerns about avian influenza A (H5N1) outbreaks in Asia, 
Europe, the Near East, and Africa?. ” 85 

4.7 Summary 

This chapter has discussed how WKTM can be used to evaluate the trustworthiness of 
websites as the objects of trust for the development of CAs’ knowledge bases. This chapter 
also resolved many of the issues raised from the first full implementation of AKEA during 
the SARS crisis in 2003 [110-112], in which the knowledge extraction was based on the 
popularity of websites, instead of their trustworthiness. 


Based on the proposal and experiment described in this paper, the contributions of this 
chapter are: 

• The procedure of selecting trustworthy websites for building a conversation agent’s 
knowledge bases is proposed. 

• A scheme for selecting a “unbiased seed set” for building a corpus has been presented. 

• A Web Knowledge Trust Model (WKTM) for determining reputable, credible, 
reliable and accountable websites is proposed. 

• Results of an evaluation based on 1,428 Bird Flu Pandemic websites crawled by 
Google API are presented and discussed. Some interesting statistics on the hit 


The example of the sentence extracted from http://www.pandemicflu.gov/faci/avianinfluenza/1235.html 
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frequency, a significant data collection based on PageRank and Stanford Web 
Credibility are observed. The corpus is also used to evaluate the proposed WKTM. 

These contributions indicate that this novel approach contributes towards the building of 
restricted CAs domain knowledge based on WKTM. The proposed model demonstrates the 
credibility of the web sites could be defined and is probably closer to a realistic expectation 
of trustworthiness. The course of this research is now directed to the evaluation of the CAs 
framework. The practical development, implementation and testing of the framework are 
dealt with in the following chapters. 
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CHAPTER 5 


AN EVALUATION OF CONVERSATION AGENTS FRAMEWORK 


5.1 Introduction 

The growth of conversation systems is limited by a lack of evaluation, especially of the 
performance and the quality of responses from the CA’s. The Turing Test, for instance, is a 
proposal to test a machine’s capability to demonstrate intelligence in five minutes. In the 
Loebner Prize contest, judges evaluate which of the CAs entered is the most humanlike in 
ten minutes. For a practical CA operating without the time limit, a novel methodology based 
on qualitative and qualitative approaches is proposed. It is apparent that in evaluating a CA 
framework, quantitative means such as statistics and facts are required to decide whether a 
development is a step forward, and, whether the development is worth the effort. 

In this research, the conventional approach based on laboratory experiments is used. This is 
followed by a real-time experiment and the results are compared to previous empirical 
research. These aspects of work are described in Chapter 6 and Chapter 7 respectively. In 
addition, mixtures of different methods (qualitative and qualitative) are used in the 
evaluation. They are described in the following sections. 

5.2 An Evaluation of the Parsers 

As discussed in the Section 3.5.4.3, off-shelf modified version of MINIPAR [211] called X- 
MINIPAR was used as the parser in the AINI’s NL-Query module. A comparison of X- 
MINIPAR was made with other popular parsers. In this study, X-MINIPAR was compared to 
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the CMU Link Grammar parser [277] and the Stanford parser [278]. These three parsers 
represent a cross-section of approaches to producing dependency analyses: X-MINIPAR 
uses a constituency grammar internally before converting the result to a dependency tree. 
CMU Link Grammar is based on link grammar, and the Stanford Parser is an unlexicalised 
statistical syntactic parser. With the emergence of broad-coverage parsers, quantitative 
evaluation of parsers becomes increasingly more important. Firstly, such an evaluation 
scheme is necessary to quantitatively measure the progress in the field of broad-coverage 
parsing and to compare and evaluate different parsing techniques. Secondly, in the 
development of a broad-coverage parser, it is usually very difficult to predict the 
consequences of a change to the parser or the grammar. An attempt to extend the coverage 
that is motivated by a few examples may well cause the parser to over generate or lose 
coverage in other areas. This will make the evaluation biased and difficult to compare. 
Thirdly, efficiency and coverage are often conflicting goals for a parser. A meaningful trade¬ 
off can only be arrived at if both of them can be measured precisely. Finally, quantitative 
evaluation may provide crucial information for determining the suitability of a parser in a 
particular domain or for a particular task. 

In this evaluation process, three parsers are installed and an extrinsic evaluation is applied as 
described by Bangalore et al. [279]. Extrinsic evaluation is usually used as an indirect 
method for comparing parsing systems. This approach is acceptable even if the parsers 
produce different representations for their outputs, as long as the output can be converted 
into a form usable by the system. The parsers are evaluated in terms of performance and 
accuracy of the output. 
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5.2.1 Performance of the Parsers 


In order to evaluate the comparative performance of the parsers, 1428 uncategorised 
pandemic bird flu web documents are extracted and wrapped using Google API 86 and 
BootCAT Toolkit [268]. The process has been detailed in Section 4.5.3. Regular expressions 
are used to filter out the HTML information and to extract well-formed sentences. From the 
1428 pages, the first sentence of each of the first 150 pages are extracted and parsed by X- 
MINIPAR, CMU Link Grammar and the Stanford parser. Each sentence has an average of 
50 words. A uniform policy facilitates a fair comparison between the parsing techniques. In 
this experiment, the composition or syntactic structure of these sentences are the main 
focuses. The performance of all of these three systems varied across different argument 
types. CMU Link Grammar took an average of 0.7 seconds to parse a sentence, Stanford 
Parser 0.5 seconds and X-MINIPAR 0.2 seconds 87 . It is expected that X-MINIPAR yields 
the highest performance because it was the fastest. This result was comparable to the original 
MINIPAR evaluation with the SUSANNE corpus, which is able to parse newspaper text at 
about 500 words per second on a Pentium-III(tm) 700 MHz with 500 MB memory [211]. 

Table 5.1: Performance Test for X-MINIPAR, CMU Link Grammar and the Stanford 

Parser 


Number of sentence extracted 
from web documents 

Stanford 

Parser 

CMU Link 
Grammar 

X-MINIPAR 

150 

75 seconds 

105 seconds 

30 seconds 

Average per sentence 

0.5 seconds 

0.7 seconds 

0.2 seconds 


5.2.2 Accuracy of the Parsers 

The accuracy of a parser depends on the formalisms they use to model language and the 
corresponding outputs they produce. Dependency parsers model language is a set of 
relationships between words, and they do not make widespread use of concepts like 'phrase' 
or 'clause'. Dependency parsers are popular in the applied NLP circles. The grammatical 


86 http://www.google.com/apis 

87 In this experiment, Dell Precision PWS380 Server 3GH with 1GB of memory was used 
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relationships that dependency parsers specify are similar to the semantic relationships 
encoding logical predicates of which NLP developers use to reduce a sentence. From the 
parsed output, dependency graphs 88 can be created representing how the words in the 
sentences governed or depended on one another. 

For this accuracy test, the most frequently asked questions (FAQs) obtained from the 
“ who.int ” and “ pandemicflu.gov ” websites are collected and applied to the three parsers. 
There were 158 FAQ questions used in this evaluation. An example is shown below with the 
full sentence parsing given in Appendix B: 

“Bird flu did occur in which countries?” 

Results from the CMU Link Grammar parse output shown in Figure 5.1 show that the 
second-last word has been left untagged. Figure 5.1 also shows the constituent output of the 
parse, and Figure 5.2 depicts the dependency graph showing the part-of-speech of each word, 
if there is any. It can be seen that the word “which ” failed to be tagged due to the null-links 
feature of the parser. 

[(bird.n)(flu.n)(did.v)(occur.v)(in)([which])(countries.n)] 

[[010 (AN) ] [ 1 2 0 (Ss) ] [2 3 0 (I*d) ] [3 4 0 (MVp)][4 6 0 (Jp) ] ] 

[0] 

Figure 5.1: Parse output of the CMU Link Grammar 



Figure 5.2: Dependency Graphs generated by CMU Link Grammar 


The dependency graphs generated using MINIPAR, CMU Link Grammar and Stanford parser parse 
visualisation tool which can be obtained at http://cgi.stanford.edu 
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(ROOT 
(SINV 
(FRAG 

(NP (NN bird) (NN flu))) 

(VP (VBD did) 

(S 

(VP (VB occur) 

(FRAG 

(WHPP (IN in) 

(WHNP (WDT which))))))) 
(NP (NNS countries)))) 


Figure 5.3: Parse output of the Stanford parser 



Figure 5.4: Dependency Graphs generated by Stanford Parser 


(< 

E2 (() U * ) 

EO (() fin C E2 ) 

1 (Bird ~ N 2 nn (gov flu)) 

2 (flu ~ N 4 s (gov occur)) 

3 (did do Aux 4 aux (gov occur)) 

4 (occur ~ V EO i (gov fin)) 

E3 (() flu N 4 subj (gov occur) (antecedent 2)) 

5 (in ~ Prep E2 p) 

6 (which ~ Det 7 det (gov country)) 

7 (countries country N 5 pcomp-n (gov in)) 

) 


Figure 5.5: Parse output of the X-MINIPAR 
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occur (V) 


in (Prep) 




Figure 5.6: Dependency Graphs generated by X-MINIPAR 


In the above example, X-MINIPAR and Stanford parser correctly parse every word. The 
constituents can be easily produced from the grammatical relationships listed in Figure 5.3 
and Figure 5.5 respectively. In the dependency parsers’ model language, the Stanford parser 
and X-MINIPAR also derived a parse dependency graph that would yield the best results as 
shown in Figure 5.4 and Figure 5.6 respectively. X-MINIPAR and Stanford parser use 
different types of grammar (rule-based and principle-based); therefore they produce different 
types of output. According to Klein [278] and Lin [211], both the Stanford parser and X- 
MINIPAR have been used successfully in the past and proven in this study. However, using 
two parsers means the system requires an additional component to reconcile them 
sufficiently to parse the sentences. Building a CA system is not straightforward; using an 
incremental implementation should remove some of the complexity. The use of multiple 
parsers is one aspect that makes the system more complex, therefore the first implementation 
increment of the system may be better served with the use of only one parser. 


In this case, X-MINIPAR added richness to the representation making it possible for the CA 
system to generate a more accurate machine-readable sentence in reply to the database 
language query. X-MINIPAR codes were optimised with 90 lines, compared to CMU Link 
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Grammar with around 300 lines of codes, in order to extract the syntactic categories and 
tokenisation for visual scrutiny as shown in Figure 5.1 and Figure 5.5. Moreover, the 
morphological roots of verbs and nouns from X-MINIPAR output could be obtained. This 
eliminates the need for a morphology analyser. In addition, the X-MINIPAR path is both 
shorter and simpler for the same predicate-argument relationship, and could be encoded in 
various ways that take advantage of the additional semantic and lexical information that is 
provided. Using both a syntactic and semantic grammar parser in series should also improve 
the richness of the interpretation of the natural language which was implemented in the 
natural language understanding and reasoning of the AINI conversation system. Most of the 
performance tests seem attributable to the modified version of X-MINIPAR. 


5.3 An Evaluation of the Performance Conversation Agents 

At the moment of this evaluation has been carried out, no other research has reported about 
the performance evaluation of the CA, except for question-answering systems. However, the 
results of previous evaluation by researchers of question-answering system such as 
AnswerBus [95] and NaLURI[212] are used in this evaluation as a benchmark for ensuring 
that the results from this evaluation do not deviate too significantly. In this quantitative 
approach for performance evaluation on the AINI conversation system, a set of 98 stimulus 
questions extracted from three-time Loebner Prize winner ALICE’S transcripts [280] was 
used. This set of stimulus questions was collected from conversations between four judges 
and ALICE in the 2001 contest at Science Museum, London. These judges’ questions were 
used to simulate the conversation with AINI, ALICE, ALICE Silver Edition and ELIZA 
remotely over the World Wide Web 89 . 

89 This experiment has been carried out on 10' h February 2007 from the respective CA URL: 

ELIZA at http://www-ai.iis.si/ELIZA 

ALICE at http://pandorabots.com/pandora/talk-xml?botid=890e7e46de354b3c 

ALICE Silver Edition at http://www.pandorabots.com/pandora/talk?botid=f5d922d97e345aal 
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The response time for each question submitted to these four CAs is recorded, collected and 
analysed for average and standard deviation. The pattern of the response times for the three 
systems is depicted in the following graph in Figure 5.7. 



1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 


Question Number 


AINI ALICE * ALICE SILVER ELIZA 


Figure 5.7: Response times for AINI, ALICE, ALICE Silver edition and ELIZA 


The maximum, minimum and average response time, and also the standard deviation 
obtained from this evaluation are displayed in Table 5.2. 


AINI at http://ainibot.org 
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Table 5.2: Average response time and standard deviation for AINI, ALICE, ALICE 

Silver Edition and ELIZA 



AINI 

ALICE 

ALICE Silver Edition 

ELIZA 

Average Time (seconds) 

1.4527 

0.3688 

0.72482 

0.1061 

Minimum Time (seconds) 

0.7292 

0.6349 

0.6370 

0.0699 

Maximum Time(seconds) 

14.7943 

4.0976 

13.9578 

1.2986 

Standard Deviation 

1.7240 

0.7773 

1.9864 

0.21023 

Knowledge base (Stimulus- 

161,473 

40,000 

120,000 

200 

response Categories) 

(50.20%) 

(12.43%) 

(37.30%) 

(0.06%) 


The response time from the evaluation is shown to have similar outliers to the results in 
previous research [212]. It is expected that the response times for each query will depend on 
the population of the response categories of the CAs’ knowledge base. For instance, the total 
number of AINI stimulus-response categories was 161,473, whereas the original ELIZA has 
only 200 stimulus response categories [35]. ALICE standard edition [41], which was ranked 
the "most human" computer, has about 40,000 response categories from their AAA, and 
ALICE Silver Edition [102] has about 120,000 response categories, where another 80,000 
response categories were taken from MindPixel. 


As shown in Table 5.2, ELIZA’s response times will be better than other CAs, followed by 
ALICE, ALICE Silver Edition, and AINI, based on the number of stimulus-response 
categories in their knowledge bases. However, the response time for AINI was comparable 
with ALICE Silver Edition. Although AINI’s stimulus-response categories knowledge have 
12.9% more than ALICE Silver Edition, AINI’s response time was within the range of 
0.7292 to 14.7943 seconds, compared to ALICE with the range 0.6370 to 13.9578 seconds. 
The standard deviations also exhibit some similarity between the two CAs - AINI is 1.7240 
seconds and ALICE Silver Edition is 1.9864 seconds. 


The only major difference is the average time that might be caused by the number of 
stimulus-response categories for each of the CAs. Although AINI’s (1.4527 seconds) average 
response time was double that of ALICE Silver Edition (0.72482 seconds), this was 


147 





compensated by the fact that the total number of AINI’s stimulus-categories is about 13% 
more than ALICE Silver Edition. In addition, the introduction of components in natural 
language understanding and advanced reasoning makes question answering better in terms of 
response quality without compromising response time. These components can either 
originate from new ideas or innovative use of existing concepts. Referring to Table 5.3, the 
complexity and the demanding nature of the system increases as more components are 
included, but at the same time, the quality of responses produced also improves. This can be 
attributed to the fact that more and more computation is performed on a decreasing amount 
of information in an attempt to exploit more aspects of natural language to achieve richer 
meaning representation. 


Table 5.3: Natural language Query components in AINI compare to ELIZA and 

ALICE conversation systems 


Components 

Conversation System 

ELIZA 

ALICE/ALICE Silver Edition 

AINI 

Spelling Checker 



V 

Pattern Matching 

V 

V 

V 

Case-base Reasoning 


V 

V 

Index Search 



V 

Natural Language Understanding 
and reasoning 



V 

Dynamic response generation 


V 

V 

Supervised Learning 



V 


Here this may provide an explanation to the results. ELIZA is a system of an entirely 
different class to ALICE and AINI. While AINI appears to be slower, the system has to go 
through a larger amount of processing as shown in Table 5.3. In general, the graphs have 
revealed that the response time of AINI is actually similar to other systems which appear to 
include less demanding resources and processing. In addition, the results have also shown 
that the response time of AINI is consistent despite the uncertainty in the type of questions. 
This is illustrated with the relatively low standard deviation as compared to ALICE Silver 
Edition. This is important as AINI was designed to handle questions of open-domain and 
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domain-specific nature. This is unlike other existing CA systems that are only restricted to a 


specific domain such as the virtual therapist in ELIZA, or, unrestricted domain as with 
ALICE. 


5.4 A Comparison Response Quality for Query Systems: Search Engine, Question¬ 
answering and Conversation System 

In this evaluation, three types of query systems were compared, and the results were 
published in [223]. They are: (a) search engine, (b) question-answering system , and (c) 
conversation system. Lor each query system, two different engines are compared against 
AINI. The two search engines compared are Google and Yahoo. Lor the question answering 
systems, AskJeeves and START are used. They are supposed to use natural language 
processing for their queries. Lor the conversation systems, ELIZA and ALICE are selected. 

Google is a well known search engine which determines relevancy of information primarily 
on their PageRank algorithm [281, 282]. In this experiment, a query interface using Google 
SOAP Search API service[283] and Yahoo! [284] was developed. Lor the Question 
Answering system, the idea behind AskJeeves and START is to allow users to get answers 
for questions posted in natural language. Ask Jeeves is the first commercial question 
answering system available on the Internet. START [285] is the world's first web-based 
question answering system which commenced operation in December 1993. As for the 
conversation systems, ELIZA is a well known program in the discipline of AI and it is also 
the oldest system of its type. ALICE [214] is a general conversation system based on the 
AAA. The knowledge base rule set consists of approximately 46,424 categories. In a way, 
AINI can also be considered as an enhancement of ALICE with the specific inclusion of the 
bird flu pandemic domain-specific knowledge base which was extracted by AREA discussed 
in the Chapter 4. AINI also has parsing capabilities based on a full NLUR engine for 
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multilevel natural language query [192], The control question set was submitted to the seven 
URLs where the seven systems were located. The responses to the queries were then 
collected and displayed as illustrated in Figure 5.8. 
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Figure 5.8: Experimental Design Interface for Response Quality 

In the course of this evaluation, a question was raised on how to compare the response 
quality of the three query systems. It was decided that the response query of the system 
should be based on the keywords, phrase or sentence. For example, if the question "What is 
bird fluT is asked, responses from the search engines, question-answering engines and 
conversational engines are summarised in Table 5.4, 5.5 and 5.6 respectively. The responses 
are noted as follows: 


150 






































• Search Engines - The systems identify the key words and return documents that are 
relevant to the query. The results could be tens of thousand or even millions of hits as 
shown in Table 5.4. 


Table 5.4: Responses from two search engines - Google and Yahoo 




SEARCH ENGINES 

ENGINE 

RESPONSES 



Google Reply 7.15 million pages. First hit was " Facts about previous flu 

pandemics, and key facts about bird flu and the avian influenza A 
(H5N1) virus.." 


Yahoo Reply 27.6 million pages. First hit was " Influenza viruses that 

infect birds are called avian influenza viruses ... East Asia have 
gotten sick with a particular avian flu strain (called H5N1). This 
virus has ..." 


• Natural Language Question-Answering Systems - These generate responses using 
NLP and return fewer possible answers when compared with search engines. The answer 
is possibly within the context of the document. The sample response is shown in Table 
5.5. 


Table 5.5: Responses from popular question-answering systems 


QUESTION ANSWERING SYSTEM 

ENGINE 

RESPONSES 


Askjeeves 

Reply 4.183 million pages. First hit was " The H5N1 strain of 
avian influenza has alarmed health officials around the world. 

People in close contact with live birds have contracted the 
disease, raising concerns that it may change into a highly 
contagious form that can be transmitted from human to human. 

START 

Main Entry: bird flu 

Function: noun: AVIAN INFLUENZA 



Source: Merriam-Webster Dictionary 



• Conversation Systems - These provide a quick response to the stimulus with a single 
response. Simple pattern matching and NLP are used based on a surprisingly small 
number of pre-defined rules. The example is shown in Table 5.6. 

Table 5.6: Responses from conversation systems 

CONVERSATION SYSTEMS 


ENGINE RESPONSES 


ELIZA Does that question interest you? 
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ALICE I haven't heard of bird flu. 

Avian influenza, or "bird flu", is a contagious disease of animals 
AINI caused by viruses that normally infect only birds and, less 

commonly, pigs. Avian influenza viruses are highly species- 
specific, but have, on rare occasions, crossed the species barrier 
to infect humans. 

http : //www . who . int/csr/disease/avian_influenza 

/avian_faqs/en/index . html 


In this example, ELIZA conversation system responded with “Does that question interest 
you?” It is observed that ELIZA tries to ask another question, instead of giving an answer. 
The objective is to encourage the user to continue with the conversation. On the other hand, 
ALICE attempts to convince the user by generating random answers from its AAA 
knowledge base. ALICE'S response does not need a grammatical parser as her knowledge 
base contains the pattern "WHAT IS BIRD LLU?" and the witty reply is “XLIND *” with 
an AIML categories. By using “XLIND *” pattern, ALICE will randomly generated 
responses such as “Is there only one”, “Let me think about it.”, “Have you tried a web 
search?”, “I haven’t heard of bird flu. ” etc. The pattern matching language used in ALICE 
permits only one wild-card ('*') match character per pattern. Therefore, ALICE responds 
with a variety of inputs from the users. ALICE is not concerned about whether it really 
"understands" the input. It aims to provide a coherent response to the client in order to 
convey the impression that the system understands the client's intention. Lor the ELIZA and 
ALICE systems, they are not able to handle questions that demand specific answers. They 
are simply not designed for such a purpose. The three possible ways to handle these types of 
questions are: 

• Analyse the problems with NLP and then provide an appropriate answer, 

• Rely on a human to review the conversation logs and continually improve the 
knowledge base, or 

• Treat the query as impossible and then choose a pre-defined random answer. 

Lor the AINI conversation system, the response was “Avian influenza, or “bird flu ”, is a 
contagious disease of animals caused by viruses...” The answer was generated from the 
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domain-specific knowledge base using NLUR parsing from Level 1. In this query, the 
answers were discovered by AINI from trusted sites such as WHO (World Health 
Organisation). In addition, the response is based on the natural language understanding and 
reasoning. The reasoning mechanism of the AINI is based on answer discovery in a layer- 
oriented knowledge base. Although the systems used in this evaluation were built with 
different objectives in mind, the purpose of this evaluation is to show that there is a need for 
the proposed system to handle domain specific applications. At the same time, this 
evaluation also shows that the proposed conversation agent architecture can achieve the 
expected objectives. 

5.5 Summary 

In this chapter, we have presented an evaluation of the CA system, AINI, using quantitative 
and qualitative approaches in laboratory experiments. It is not practical to develop a CA from 
scratch; therefore, based on the performance and accuracy of the natural language parser, 
such as CMU Link Grammar and Stanford parser, X-MINIPAR has been selected and 
integrated into the NLUR component in AINI’s framework. 

Using the available natural language parser, the START and AnswerBus question-answering 
systems were compared with the AINI conversation system in a form of quality evaluation. 
Initial results have shown that AINI is comparatively better in terms of the quality of 
responses generated. One of the criteria that have contributed to the higher score of AINI is 
the capacity to generate useful responses dynamically using two advanced reasoning 
components, namely explanation on failure and dynamic answer generation to cater for the 
condition when no answers are available. 
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In the quantitative approach for performance evaluation on the CA, AINI was compared with 
other three CAs, namely ELIZA, ALICE and ALICE Silver Edition. Due to its simplistic 
design, it is expected, ELIZA’s response time will be better than other CAs. This is followed 
by ALICE, ALICE Silver Edition and AINI. This can be attributed to the fact that more 
computation is performed by AINI. Although AINI’s stimulus-response categories 
knowledge have 12.9% more than ALICE Silver Edition, AINI’s slower response time was 
due to the introduction of the NLUR components which improve the quality of the responses. 

Lastly, an evaluation of the response quality of three types of query system was undertaken. 
These were search engines (Google and Yahoo), question-answering systems (AskJeeves and 
START) and conversation systems (ELIZA and ALICE). For each system, two different 
engines were compared against AINI. Search engines and question-answering systems are 
not suitable in this case study because of their propensity to return possibly thousands or 
millions of hits, or links to relevant and irrelevant documents. For the conversation systems, 
their response quality is considered poor because they are not able to handle questions that 
expect specific answers. Neither system is concerned with whether it really “understands” 
the context of the question. They merely aim to provide a coherent response to the user in 
order to convey the impression that the systems understand the user’s intention. However, for 
AINI, the response was generated based on NLUR and the answers discovered from trusted 
web sites extracted by AREA. The positive results revealed in this laboratory experiment 
enable AINI to go online for a public experiment in real-time. This is described in the next 
two chapters. 
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CHAPTER 6 


AN ANALYSIS OF THE LINGUISTIC FEATURES FROM REAL-TIME 
HUMAN-MACHINE INTERACTION 


6.1 Introduction 

This chapter describes the evaluation of the AINI conversation system (termed as machine) 
in the similar approach as that used in the Loebner Prize competition as discussed in Section 
2.3. However, the evaluation in this study is based on real-time conversation between AINI 
and online users using MSN Messenger, instead of control environment with 10 minutes 
limitation of times in the Loebner Prize competition. In this experiment setting, users are free 
to ask any question under any domain, without a time limit and border. In addition, the 
objective of AINI is to provide relevant answers and prolong the conversation rather than 
attempting to disguise itself as human. 

The goal of this study is to evaluate the use of natural language to collect unbiased user 
expressions as they are engaged through user-generated tasks such as general questions or 
searching for domain specific answers. This is different from previous studies such as 
Harvard Medical School’s Virtual Patient program, VPbot [74, 75], CMU Nursebot [78, 79], 
MIT Media Lab, OpenMindBot [80], Terrorism Activity Resource Application [83, 84], 
Story Telling Agents (VISTAs) [55], FAQchat [57], Virtual Guide for Cultural Heritage 
Tours [59]; Foreign Language Learning (FLL) [62], TutorBot [65], Partner (SP) [66, 67], 
Kairai 3-D software robots [69], Discussion-bot [70], Intelligent Tutoring Systems [71]; and 
recommender systems [72], in which the experimenter-generated tasks were intended to 
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evaluate or test the functions of the system. The disadvantage of experimenter-generated 
tasks [286] is that they do not allow for assessment of the context of the conversation and the 
language’s characteristics. 

In the present experiment, the use of user-generated tasks will allow the evaluation of the 
linguistic features and paralinguistic cues. In this study, only MSNChat interface was used 
although AINI is also capable to communicate through the WebChat communication channel 
as described in Section 3.5.2.3 This is because the MSNChat interface provides more 
features such as emoticon than the traditional web interface. Such features are inherently 
closer to the properties of natural language. In addition, other advantages are the inclusion of 
pre-populated contact lists, integrated authentication, better security and privacy (ethical 
considerations), free and they are pre-installed on most operating systems. 

In addition, several reports have been published that refer to the use of IM as a new medium 
of communication between users. There are also research on the design and usability of IM 
for the public [287], [288], [289]. Studies also revealed that IM usage in workplaces and 
corporate contexts have recently soared [290], [291], [292], [293]. With regard to the 
linguistic aspects of IM usage, research has been undertaken in Spain [294], United Kingdom 
[287], United States [295], Sweden [296] and Portugal [297], Although the literature varies 
on the impact of the IM, the vast majority of scholars agreed that IM plays a primary role in 
human-to-human communication online. However, few are working on human-to-machine 
conversation in MSN Messenger. The first initiative to develop CAs was launched by 
Microsoft 90 in the “Robot Invaders Contest 2006” 91 , which sought the best new CA ideas for 
MSN Messenger and the CAs were termed “virtual buddies”. Such programs incorporated 


90 http://www.microsoft.com/uk/press/content/presscentre/releases/2006/06/PR03717.mspx 

91 https://www.robotinvaders.com 
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with the MSN Messenger and they are becoming popular among companies 92 as they bring 
positive effects on customer relations [298-300]. The expected popularity of IM and CA thus 
formed the motivation of this research. 

6.2 Experimental Setting 

In this experiment, data collection is via a publicly accessible system which encourages 
spontaneous human-computer interaction. It is expected that users may behave more flexible 
and spontaneously when interacting within an intervened environment. It is believed that the 
utterances will be less constrained than those recorded in a laboratory setting. The 
participants may prone to interact spontaneously when they are not concerned with the 
expectations of the researcher behind the experiment. In this sense, the corpora collected 
from such a public system are likely to better reflect a ‘true’ picture of the usage of language 
by the users. In addition, the data would have come from a wider range of users and they 
should represent a broader socio-cultural background from the online community. A final 
advantage of collecting online data is the relatively low cost involved. 

However, in order to meet the ethical requirements described in section 1.2, subjects have to 
be informed that the human-computer utterances are being recorded. Participants have been 
advised that the participation of the study is voluntary and they may withdraw at any time. 
The participant’s responses will also be held in strict confidential. All information given 
during the study is confidential and no names or other information that might identify a 
participant will be included in any publication arising from this research. 


92 IM Interactive, http://www.improvcenter.com 
Incesoft, http://www.incesoft.com 
Colloquis, https://buddvscript.colloauis.com 
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In this chapter, results obtained from real-time human-computer exchanges using MSN 
Messenger are reported. This discussion of results is based on the linguistic features and 
paralinguistic cues. The results based on linguistic features are described in this chapter 
whereas those related to paralinguistic cues are discussed in Chapter 7. Several laboratory 
studies [301-304] have been conducted to investigate how users communicate with a 
computer using unrestricted or informal languages. The goal of those studies was to provide 
a system incorporating habitable expressions as expected by the participants. This requires 
extensive interface program customisation for different application domains. Unfortunately, 
the integration of domain-specific and open-domain information, and the deployment of 
natural language query have not been well developed or reported. 

This chapter examines users’ interest and linguistic features in conversation logs collected 
from MSN Messenger conversations between AINI and online users. The study is based on 
the corpus of utterances taken from the IM texts using MSN Messenger comprises of 
MSNDesktopChat, MSNWebChat and MSNMobileChat as described in Section 3.5.2.3. 

6.2.1 Participants and Corpus 

The experimental portal 93 is open to the public worldwide. The participants involved in the 
experiment will be provided with background information and consent forms as detailed in 
section 1.2 and Appendix A. This portal allows the online users to add AINI’s contact to 
their “buddy-list”, thereby allowing them to easily send and receive short text messages. 
When a participant opened a message window with a buddy for the first time and provided 
the buddy is online, an alert will be sent to the buddy notifying the new participation in the 
study. Participants can also withdraw their consent and participation by log-out at any time. 


93 http://ainibot.org/studv 
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In the conversation log files, the nickname, MSN account, date and time of the dialog, as 
well as the spoken texts (inputs and outputs) were recorded. During a conversation, a unique 
ID for each buddy was created and stored to replace the original ID of the buddy. These 
measures were taken to protect privacy and confidentiality. In this experiment, participants 
are may use any features as if they are communicating with a human buddy. 

Previous research has shown significant differences in IM communication resulting from the 
frequency of communication in term of the number of visits and the number of “turns” in the 
conversation [151, 305, 306]. In this study, word frequency was also used for the analysis of 
the corpora collected from the two different sources 94 . For the human-human transcripts, the 
conversation text from TRAINS 95 , IRC 96 and Google Blog 97 transcripts, were extracted for 
linguistic purposes based on the methodology proposed by Liideling et al. [307], For the 
human-machine conversation text, they were extracted from the award winning Loebner 
Prize 98 transcripts of ALICE (2001, 2004) and Jabberwacky (2005, 2006). These transcripts 
were separated into the human utterances (LPJudges) and the computer or bot’s (LBBots) 
utterances. 

Another corpus is collected from a real-time transcript produced by AINI and 65 online 
buddies[308] in MSN Messenger. A total of 29,447 words of running text were processed 
and there were 2,541 unique words, 129,760 characters and 4,251 sentence counts recorded. 
This set of data comprises of a total of approximately 63 hours of recorded conversation data 
in the form of 3,280 outgoing and incoming instant messages exchanged with over 65 
buddies. Out of the 65, three buddies used MSN Mobile. The average sentence length of an 
IM transmission was 6.90 words, with approximately 13% of all transmissions being a single 

94 The datasets collected can be access at http://ainibot.org/datasets . 

95 The TRAINS 93 Dialogue (Six and half hour speech)at http://www.cs.rochester.edu/research/speech/93dialogs/ 

96 Swhack IRC chat logs (9/7/2007 - 9/8/2007) at http://swhack.com/logs/ 

97 Google Blog (3/22/2007 - 8/08/2007) at http://googleblog.blogspot.com/ 

98 http://www.loebner.net/Prizef/loebner-prize.html 
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word in length such as “ hi ”, “ hello ”, “ok”, etc. Table 6.1 provides a summary of the data 


collected. 


Table 6.1: Frequency of Word from Conversation Logs 



AINI 

Human 

Total 1 

Word 

18,358 

11,089 

29,447 

Unique Word 

1,368 

1,173 

2,541 

Character count 

79,884 

49,876 

129,760 

Sentence count 

2,840 

1,411 

4,251 

Utterance 

1,721 

1,559 

3,280 

Average sentence 

6.46 

7.85 

6.90 


The participants came to ‘know’ AINI during the Invasion of the Robots Contest" and 
advertisements on well-known BBS (bulletin board systems), such as Robitron (world- 
known CA’s developers forum) 10 °, blog websites and through the AINI experiment portal 101 . 


6.2.2 Measures 

This study mainly aims to identify the linguistic features of the human-machine interaction 
corpus. Two approaches are used - statistical analysis and visualisation of the corpus 
collected. It begins by identifying general features of the texts that constitute the English- 
language corpus collected. The configuration of the relationships within the corpus collected 
can thus be described and compared both visually and statistically. The N-gram Statistics 
Package 102 (NSP) was used. NSP is widely used in corpus linguistics techniques [309-314] 
and it is not only popular in English text analysis but also in Hungarian [315], Dutch [316] 
and Bulgarian [317]. NSP allows the user to identify word /7-grams that appear in the corpus 
using standard tests. NSP is a suite of programs that aids the analysis of N-grams in text files 


99 http://ww w .robotinvaders.com/main/default, aspx 

100 http://tech.groups. vahoo.com/a~oup/Robitron 

101 http://ainibot.org/studv 

102 NSP Tools can be downloaded at http://search.cpan.org/dist/Text-NSP 
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[318]. An N-gram is defined by NSP as a sequence of'/?' tokens that occur within a window 
of at least ’n’ tokens in the text. What constitutes a "token" can be defined by the user. 

In the next phase, the corpus created in the earlier stage was analysed to find their likelihood 
(LL) ratio. The log-likelihood ratio method has been described in section 4.5.4.1. The two 
data sets of real-life data from IM conversation between AINI and human users are 
compared with this statistical method. 


In addition to the above, the evaluation is also aimed at improving the understanding of the 
retrieval results using visualisation techniques. Visual representations could accompany 
textual communication to enhance the interaction. In particular, this is facilitated by 
computers which are capable to create and share visual objects through graphics and 
communication software [319]. In this study, new visualisation tools have been developed to 
capture the IM characteristics and to facilitate the analysis of the chat activities including 
linguistic features and paralinguistic cues. 


Data 


Visual Form 



Transformations Mappings Transformations 


t_t_1 


Human interaction 



Task 


Figure 6.1: The Visualisation Pipeline [320], redrawn by Neumann [321] 


Based on the visualisation pipeline in Figure 6.1 and the MSN History Visualisation tool 
[322], an interactive chat visualisation system called VisualChat [323] was developed. 
VisualChat built with Processing 103 to visualise and analyse the human-machine 


103 Processing programming software can be downloaded at http://processing.org 
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conversation logs. The Processing environment is written in Java. VisualChat 104 is capable to 
display the timeline of several textual conversations simultaneously and enabling the 
discovery of utterance lengths and specific reoccurring keywords. The application reads 
conversation messages in Microsoft MSN XML format and generates a graphical display 
that allows comparisons between the features of human and machine conversations. 


As shown in Figure 6.2, the system provides an interactive visualisation environment that 
allows the user to navigate across the sequence of conversation. The top left comer (U ) 
shows the statistics such as word frequency and the top ten words extracted from the 
conversation logs. The bottom left (V ) corner node represents a typical single chat session 


VisualChat: Chat Visualization System 


P words Freq 
Top ton 


59 % 
' 3 


He'lo.my name is AINI and I JnWltahiHithai’ 


Are you there? 


O.X 


Killing is wrong 


perhaps a psychiatrist 
Final answer $125/000’ 


whwtittVfemat 


» • 

♦ ♦ 


*«—w* M,,you,her '' 


it'll cost you' 


who are you? what is thatf^ { - - 


y*r*ybWWfere’ 
Be s specific. 


Froxy^M to you 

♦»* *♦. 

♦ ♦. 

♦ ♦ 

# # user ID 1003 


Searching Seliptafcg fftaaaFB&xxdl by 


Are you there? 


And I what? 
Tou don't say. 


what about you guess? 

I like red 


: © 


TO 


AINI 
Date 

e 2007-1-4 »6 7 15 


Is it ftLining*. .. 

■ m asking you? 


"Use a calculator' 


oh yea. sure 




Figure 6.2: An Example of Visualisation Chat between AINI and Human in IM using 
VisualChat 


between AINI and ‘her’ buddy (userID1003) on 1 April 2007. Ring (or row) represents a 
total number of AINI’s buddies. The right most end (W) with the light colour node (yellow) 
indicates the starting point of the conversation in the network. Each node is a turn of 
dialogue and the utterance for each session appear collectively as a graph. The population of 

104 VisualChat currently only support ten concurrence messages in one session. 


162 




nodes also increases depending on the number of conversations that have occurred on that 
particular day. However, the history of the conversation is continually updated as soon as the 
users return. Thus the visualisation gives an illustration of the dominant concepts and their 
frequency, as well as the intensity of the communication between human users and the CA. 


6.2.3 Conversation Logs 


A Chatlog System has been developed using MySQL to store user messages to a secondary 
storage located at the agent knowledge (data layer) as shown in Figure 3.3. The storage 
provides real-time archiving of the chat messages so that they can be searched by keywords 
and user ID. This also allows topic-based retrieval and replay of chat sessions. These chat 
messages are essentially plaintext messages that are quite small in comparison with images, 
video, or documents. These plaintext messages, also known as instant messages, are the 
regular messages sent between the chatting buddies on MSN messenger. The history of the 
conversation can be extracted and saved in XML format for analysis using the VisualChat 
tool. An example of the XML format is shown below: 


<Message Date="2007-l-3" Time="21:57:54" SessionID="l"> 

<From> 

<User FriendlyName="userID1001"/> 

</From> 

<To> 

<User FriendlyName="AINI"/> 

</To> 

<Text Style="font-family:MS Shell Dig; color:#000000;">hi</Text> 
</Message> 

<Message Date="2007-l-3" Time="21:57:57" SessionID="l"> 

<From> 

<User FriendlyName="AINI"/> 

</From> 

<To> 

<User FriendlyName="userIDl001"/> 

</To> 

<Text Style="font-family:MS Shell Dig; color:#000000;">Hello 
there.</Text> 

</Message> 
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The data was analysed using techniques from Conversation Analysis [324], Conversation 
Analysis is a method originally used for analysing spoken conversation between humans. 
The techniques are now used for analysing the text chat in human-machine conversations. 
Through examination of the transcripts, Conversation Analysis derives the coherence from 
the sequences of utterances. 

The excerpt shown in Figure 6.3 is from a typical single session IM conversation exchanged 
between AINI and human buddy with ID U1025. This ID is automatically generated by AINI 
to protect the identity of the actual buddy. This session illustrates the nature of the 
communication. Each new session will start with AINI giving a random greeting (message 
#1) such as, “Hi there “How are you today?”, “Hey :), “nice to meet u”, “How I can call 
u?”, etc. These greetings will indirectly get some information about the user’s identity, such 
as their name or gender (utterance # 2). In this session, U1025 gave a name as “Sommer” 
and a “male”. (This name is a nickname and is not the true name of the user. Similarly, the 
gender information is irrelevant nor intended to be verified by AINI.) Although “Sommer” 
used an emoticon to represent the first initial of his name “(H) ”, which normally refers as 
“hot smile =£”; AINI recognised the letter “H” and decided to called him “Hommer”. 
Following some dialogue (utterance # 3), AINI gave a greeting to “Hommer” which came 
from Open-Domain knowledge. Since AINI’s knowledge is not equipped with full IM 
features such as acronyms, abbreviations and emoticons, utterance #5 shows that AINI failed 
to recognize the user input “shO”, which refers to “Same here” and “©(smile)” emoticon. 
This drawback caused AINI to return from dynamic random answer domain. 

The next messages in utterance #6 and #14 show U1025 tried to challenge AINI by putting 
forward questions such as “Who was the first American in space?” and “what is bird flu?”. 
This challenge could definitely be answered by AINI, because her knowledge bases were 
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fitted with TREC factoids, and information on pandemic bird flu from Domain-Specific 
knowledge base. In the final session (utterance #20 - 21), both participants ended with 
greetings, and AINI replied with the simple abbreviation “ TTYL ” (Talk to you later) and the 
intonation “Gee”. These transmissions constitute a single session and also include the use of 
shorthands, acronyms, abbreviations, loose grammar and minimal punctuation. These 
common forms of communication between humans have been observed by previous research 
[297, 306, 325, 326], 



1 AINI Random HeyO, nice to meet u. How I can call u? 

2 U1025 just call me Sommer 



6 U1025 Who was the first American in space? 

7 AINI Specific Alan Shepard 

8 U1025 Brilliant 



16 U1025 that's was bad news 

17 AINI Open Makes sense to me. 



21 AINI Open_TTYL, Hommer Gee. 


Figure 6.3: A typical Single Session Conversation between AINI and user U1025 
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6.3 Domain Knowledge and Conversation Topics 

After a thorough examination of the logs of over 3,280 utterances, it was found that human- 
machine dialogues have discussed topics from every aspect of everyday life. These topics 
include emotion, love, sex, computers, entertainment, sport etc. As shown in Figure 6.4 and 
the illustration by VisualChat in Figure 6.6, several detailed topics have been discussed in 
every category. Almost 39.4% of the IM exchanges have discussed issues including 
friendship, sex and love. This finding is remarkable as the AINI conversation system is 
trained to simulate a human partner in IM. This was because the IM users, who are mostly 
young people, wanted to tell AINI some private issues and experience. Even in the dialogues, 
some of them praised AINI, invited “her” on a date, and some of them disclosed their 
personal challenges. About 17.7% invited AINI to talk about the robot technology of CA and 
some even tried to test AINI’s intelligence by arguing with “her”, and some of them tried to 
cheat. It is likely that there are CA developers among this group of users and a number of 
them came to “know” AINI from the “Invasion of the Robots Contest” websites. There were 
53 CA programmers competing in the contest and some of them realised that they were 
talking with a robot or a computer program after a short period of chatting with AINI. 


Emotions 


others 


39.4% 



Figure 6.4: Frequency of Conversation Topics 
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Figure 6.5: Frequency of AINI’s Responses based on Domain Knowledge Bases 


As discussed earlier in Chapter 3, AINI’s domain knowledge model incorporates several 
knowledge domains with the objective to give the users the best answer in a conversation. 
An analysis of the source of knowledge where the answers were extracted is shown in Figure 
6.5. From AINI’s log with 1,721 utterances, AINI used 88.03% of the knowledge from 
Open-Domain knowledge bases and only 2.8% from Domain-Specific knowledge bases. As 
explained earlier that the experiment did not restrict the conversation to Domain-Specific 
knowledge on the SARS epidemic [111] or Bird Flu Pandemic [113] only. However, 
AINI’s knowledge domain was equipped with crisis communication knowledge bases which 
were included in the Natural Language Corpus and FAQs extracted from online documents 
using AKEA [156]. In terms of frequency of appearance, the two words, SARS and Bird Flu, 
occurred roughly equal:. These words appear to be rather specialised terms, used in a 
restricted number of conversations in contrast with other words. AINI’s “buddies” get to 
know the availability of these domain knowledge bases from AINI’s Crisis Communication 
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Network portal (CCNet) [117]. About 37 utterances are related to health domain question 
such as diagnose, treatment, symptoms, spread, protection, cause, vaccination and risk of the 
SARS epidemic or Bird Flu pandemic. 

On the other hand, AINI responded with 85.7% of its conversation from the AAA’s 
knowledge bases. This is not a surprise as the AAA’s knowledge bases cover most of the 
common topics and knowledge including emotion, sex, literature, music, religion, science, 
sports, etc. More than 45,318 AAA stimulus-response categories are stored in AINI’s 
knowledge base. Each category contains a stimulus-response (also called input-pattern) and 
an output-template. Another common sense knowledge base is made up of AINI’s stimulus- 
response categories, which came from the TREC and MindPixel corpora. Although common 
sense stimulus-response categories cover almost half of AINI’s knowledge bases (49%), only 
2.3% of the total responses are related to common sense questions. Despite the fact that 
common sense questions play a major role in formal conversation, AINI’s “buddies” are 
normally more interested in issues of daily life or personal interest, instead of the factoid 
questions that are provided in TREC and the MindPixel corpus. 

AINI’s query engine works based on the natural language query: if a matching category is 
found in the knowledge bases, it will be retrieved and be transformed to the output. If no 
matching category is found, AINI’s query engine will send the request to the random 
response knowledge base, and a generic answer is generated dynamically. These replies 
sometimes may be inappropriate, amusing and thoughtless responses and comprised 9.82% 
of the total output of the IM conversation. Obviously, these expressions are irrelevant and 
unrelated and make AINI’s “buddies” feel irritated and confronted by AINI. These 
expressions occur because of the differences in manners of speech and speech acts (e.g. 
declarative, interrogative or imperative or exclamatory). This is because IM human users 
have a tendency to use shorthand, acronyms, abbreviations and emoticons (see Chapter 7). 
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Unfortunately, AINI was not trained to understand such expressions in the short period of 
time in which this study was conducted. However, AINI is capable of learning from domain 
experts through the Supervised Learning module (discussed in section 3.5.4.7). The 
unanswered questions will be maintained separately by a domain expert or ‘botmaster’ who 
will keep AINI’s knowledge bases updated regularly. The domain model has been designed 
in such a way to make sure in subsequent sessions of conversation, AINI will ‘understand’, 
and should be able to participate in a meaningful conversation in the future. 

6,4 Linguistic Analysis 

In this section, some of the interaction features of the recorded chat are discussed. Studies of 
text chat have tended to focus on the interaction problems caused by the properties of text 
chat. This research seeks to examine the underlying relationship between linguistic features 
in the context of CAs interacting with human users via MSN Messenger. To be more 
specific, this study refers only to the linguistic features in textual communication via the 
Internet between at least two “participants”, one of which is the AINI. The communication as 
such often involves the uses of human language. 

6,4.1 Word Frequency Analysis 

Words in an IM corpus are checked against the Shakespeare 105 and British National Corpus 
(BNC) 106 . There are significant similarities in the top ten words occurring between the 
Shakespeare and BNC corpora, but these differ from the IM corpus as shown in Table 6.2. 
The BNC reference list provides a gauge of common usage (words per million). As a result, 
words with a higher ranking within the BNC (for example, words such as TV, 'the' and ‘a’) 

105 Shakespeare corpus can be downloaded at 
http://www.lexicallv.net/downloads/corous linguistics/ShakespearePlavsPlus.zip 

106 BNC corpus can be access at http://www.natcorp.ox.ac.uk/ 
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appear more often in standard written and spoken English text. The BNC is a 100 million 
word collection, which includes millions of words of transcribed conversation, printed text, 
lectures and oratory. The top ten words used in BNC are “the, at, of, and, a, in, to, it, is, was”. 
Similarly to the corpus of Shakespeare, approximately 22,000 different words was used in 
the published works. Out of those 22,000 words, the most commonly used are: the, of, and, 
to, a, in, that, is, I, it [327]. Those ten little words account for 25% of all speech. 


Table 6.2: Top Ten Words Used in Shakespeare, BNC and IM Conversation Agents 




Instant Messaging 1 

1 Shakespeare 

BNC 

AINI 

Human I 

the, of, and, to, a, in. 

the, at, of, and, a, in. 

I, you, do, am, me. 

you, I, do, what, is, a, are, 

that, is, I, it 

to, it, is, was 

my, what, your, to, it 

to, the, it 


The figures are based on research that dates back to the eighties, and it can be seen that a 
couple of words have fallen from favour in the latest list. Relatively "big" words, like the 
conjunction 'that' are no longer up there in the top ten words in the BNC corpus, or even in 
the IM corpus. The pronoun “it” and preposition “to” are among the most popular words 
used across the four corpora. Based on the present finding, the most significant similarity 
between Shakespeare and BNC corpus toward IM corpus is the used of pronouns. In fact, the 
results show that in the AINI messages, pronouns are used at a significantly higher rate than 
that by IM human. This can be explained by the IM corpus being purely dialogue based, 
instead of written or task-oriented based, as in the Shakespeare and BNC corpora. Another 
possible explanation for these differences is that IM conversation displayed the participant 
roles more explicitly. Hence, this reinforces the illusion that the conversation really has two 
participants. In addition, in IM conversation system corpus display considerable variation 
both within and across users: Conversations contain many one-word transmissions, but also 
many lengthy transmissions. 
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6.4.2 Lexical Analysis 

6.4.2.1 Humanness of Conversation with Pronouns 

Pronouns occur more frequently in conversation compared to written text. This is shown in 
Table 6.3 by comparing AINI to the BNC spoken text corpus. The log likelihood (LL) of 
pronouns in BNC is higher in the spoken text indicating the distinction between speech and 
written text. This distinction also occurred in the human-machine conversations between 
AINI and IM human users. There is significant difference between the frequencies in AINI 
and IM human conversation. AINI scored higher in log likelihood on the singular first- 
person pronoun “7” (LL: +71.73), second-person pronoun “you” (LL: +0.23), third-person 
pronoun “we” (LL: +1.56) and the objective personal pronouns “it” (LL: +11.17), and “me” 
(LL: +3.0'). 


Table 6.3: Frequency List of Pronouns used in BNC and IM Conversation Agents 


BNC Instant Messaging 


Word 

Spoken 

LL 

Written 

AINI 

LL 

Human 

you 

25957 

+385328 

4755 

748 

+0.23 

439 

I 

29448 

+369238 

6494 

851 

+71.73 

297 

it 

24508 

+151913 

9298 

317 

+11.17 

137 

We 

10448 

+106914 

2784 

45 

+1.56 

36 

they 

9333 

+52132 

3754 

17 

-0.73 

14 

Me 

244 

+8239.6 

1239 

182 

+ 3.01 

88 


Spoken : Rounded frequency (per million word tokens) in the spoken part of the BNC 
LL :Log Likelihood, indicating the distinctiveness (or significance of the difference) between the 
frequencies in BNC (speech vs writing) and IM (AINI vs human) 

Written : Rounded frequency (per million word tokens) in the written part of the BNC 


It is observed that pronouns are used more in AINI. For example, in the bigrams analysis, 
discourse verbs such as I am (1.10%), do you (0.90%), are you (0.60%), tell me (0.30%) 
occurred more frequently in AINI. To simulate human trust and expressions during the chat, 
AINI frequently uses personal and polite words such as I will (24 times), yes I (33 times), I 
love (8 times). Even in the n-gram analysis, words along the lines of nice are used with more 
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prominence in the AINI conversation, such as nice work if you (LL: +5.9), nice to meet you 
(LL: +10.7), nice I guess flowery (LL: +7.3) appeared more often in AINI, to give an 
impression of human feelings. Nass [328] suggests that the better a computer’s use of 
language, the more polite people will be to it. 


Words Freq 
Top ten 


8.9% 
f 4 


VisualChat: Chat Visualization System 
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I have one master 
I think I do. 
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I will help you 
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What is the date? 
vrhat about your dress 


you are very repetitive 


no you're not 


What is the 10000th? 
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Indeedit is so. 


What ls^iant sand? 


You didn't say 


Thanks for the info. 

where do youlive? Do y oureadb t#m\ youever sleep? 
idream of flying 


I think I do. 
trance, electronica t0 ° 


Moi-lie. Never. 
Ie OU ldasksAf#oW saia,hi! ' 


I like to socialize. 

And you are human...? 


YeR« hilt trljk b Yzhva? p e al 
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Sounds fine to me. 


What might they say? 
I'm afraid I don't 


But it's nice h$f* 


And when is that? 

Do you watch TV? 

Whatever, change the topic. what about r.ord«nBro\n 


Thai's ver 


trance, electro ..•caa»«tf» s -r.~r-— * — 
Is ; your hair? 


What did - * * 


Figure 6.6: Visualisation of the Lexical Features used in the IM Conversation Agents 


As an example, the lexical features are shown in the visualisation tool as shown in Figure 6.6. 
The colour intensity of the text varies according to the frequency. Higher frequency words 
are brightly coloured, while the ones with lower frequency are less bright. 

6.4.2.2 Contracted Words 

The uses of contracted words are common in conversation and text chat. An example of 
contracted words is what’s instead of what is. From the conversation logs, AINI and IM 
human users used many contracted words in their conversation. It is observed that the 
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contracted forms of the verbs are much more frequently used by the IM human user than by 
the AINI. This is shown in Table 6.4 107 and Figure 6.7. 


Table 6.4: A comparison of the Frequencies of Contracted Verbs used in BNC and 

IM Conversation Agents 


BNC Instant Messaging 



Ratio 


AINI 

Ratio 

Human 

Ratio 

’m: am 

9.97 

127 

49 

2.59 

458 

42 

10.90 

’re : are 

0.91 

28 

169 

0.17 

217 

187 

1.16 

’s: is 

1.56 

76 

186 

0.41 

235 

196 

1.19 

’d : had 

0.20 

0 

4 

0.00 

9 

21 

0.42 

’ve: have 

0.62 

7 

103 

0.07 

42 

39 

1.07 



Figure 6.7: A comparison of the Frequency of Contracted Verbs used in BNC and 

IM Conversation Agents 


107 BNC corpus based on per million word tokens. The ratio is calculated by dividing the first (contracted) frequency by the 
second (uncontracted) frequency. A ratio of more than 1.00 indicates that the contracted form is commoner than the full 
form. Notice that, for speech, all of the ratios are greater than those for writing and three exceed the 1.00 value—i.e., the 
contracted form is the commonest. A further ratio comes very close to 1.00. 


173 






















In the BNC corpus [329], the contracted forms of speech ’m, ‘re, ‘s, and ‘ve are more 
commonly used than the uncontracted forms, am, are, is, has, and have. Interestingly, in the 
IM conversation logs, this characteristic also occurred, especially with the IM human user, 
but rarely in AINI messages. IM human users prefer to use contracted verbs instead of 
uncontracted verbs. The ratio list in the Table 6.4 shows that in IM, the contracted forms ‘m 
(10.9), ‘re (1.16), ‘s (1.19) and ‘ve (1.07) are more common for IM human users than the 
other contracted verbs like ‘d. The contracted verb ‘m (2.59) is more common in AINI’s 
messages when compared to the uncontracted verbs such as are (0.17), is (0.41), have (0.07) 
and had (0.00) in their conversation. One possible explanation for the interesting differences 
in the contracted verbs is that IM human users are more likely to use shorthands in their 
messages. In fact, these contractions are used to save time typing messages and to achieve 
common ground in the IM-ing. Another explanation could be that the current AINI 
knowledge bases are not equipped with the full blend of speech used in the IM system, but 
instead they are geared more toward formal written language. 

6.4.3 Text Complexity 

In the context of linguistic, text complexity is related to the readability test. The Gunning 
Fox index was developed by Robert Gunning [330, 331] and is one of the simplest and most 
effective tools for analysing readability. This evaluation designed to measure the readability 
of a sample of English text and was reported in [332]. Today, the readability formulas are 
more popular than ever. There are readability formulas for Spanish, French, German, Dutch, 
Swedish, Russian, Hebrew, Hindi, Chinese, Vietnamese, and Korean [333]. In Analytics of 
Literature [334], Sherman’s proposed that literature is a subject for statistical analysis. He 
showed the importance of average sentence length and the relationship between spoken and 
written English. The resulting number is an indication of the number of years of formal 
education that a person requires in order to easily understand the text on the first reading. 
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That is, if a passage has a fog index of 12, it has the reading level of a United State senior 


high school. The Gunning-Fog index can be calculated with equation 6.1: 


0.4.1 


words i + | qqT complexwords 


. ^sentence j 


words 


( 6 . 1 ) 




G = 0.4(S + W), where S is the average sentence length and W is the percentage of words 
with three or more syllables. 


The readability formulae have been around for a long time. Extensive research [335], [336] 
has shown that the formulae predict the difficulty of a prose passage quite well. Using this 
scale, most popular novels have Fog Indexes of 8 -10, and academic papers are somewhere 
between 15 and 20. Most of the readability studies are looking into the written text and none 
of them placed the importance on human-human dialogue text and the application to human- 
machine conversation. As IMs become more usable as a communication media, it is 
important to do research in readability for conversation text. In this study, the ease of 
understanding or comprehension based on style of the transcript generated from human- 
human dialogue and human-machine dialogue are studied. The aim is to apply this 
understanding to issues such as text rating and texts complexity to human users. 


Table 6.5: Gunning Fox Index with Unique word, Average word and Lexical 
Density from different Corpora 


IM 



AINI 

Human 

LBJudges 

LBBots 

IRC 

TRAINS 

Google Blog 

Unique Word 

1,368 

1,173 

873 

996 

8774 

931 

22731 

Lexical Density 

13.2% 

12.3% 

35.5% 

34.5% 

20.7% 

2.5% 

22.1% 

Average 

Sentence Length 
(word) 

6.46 

7.85 

5.58 

4.06 

5.01 

5.06 

16.99 

Gunning Fox 

3.4 

3.7 

3.1 

3.3 

3.5 

2.3 

9.3 
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Figure 6.8: Gunning-Fog Index from different Corpora 

After using word frequency techniques for analysing the corpus, results are displayed in 
Table 6.5. This gives an illustration for comparing the IM human users and AINI utterances 
in the corpus. Figure 6.8 shows that the official Google Blog (9.3) website get the higher 
scored follows by IM human (3.7) and the lowest is task-oriented dialogues TRAINS (2.3). 
Blogging is a form of online communications and could be considered as a form of 
journalism. In Google Blog, the bloggers are computer scientists, system administrators, 
writers, and search engine developers. A range of topics are discussed and they exhibit a high 
degree of readability. 

On the other hand, the human-human dialogue in TRAINS [337] scored a low value because 
the dialogues are mainly task-oriented involving manufacturing and shipping goods in a 
railroad freight system. The corpus analysed in this study consists only of six and a half 
hours of speech with 5900 utterances. The evidence obtained from TRAINS dialogues 
suggest that the users used limited syntax and most of the corpus is populated with 
paralinguistic cues such as intonations and prosody markers. This is to be described in 
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Chapter 7. Because TRAINS corpus has limited volume of vocabulary (931 unique words) 
with low lexical density (2.5%) in conversation, they therefore scored lower as compared to 
AINI which has 1,368 unique words and 22.1% lexical density. An example of TRAINS 
corpus is shown in Figure 6.9. 

uttlO : u: we need to get one boxcar of bananas <sil>in B-<sil>to 
Bath 

uttll : s: okay 
uttl2 : u: um 
uttl3 : s: so 

uttl4 : u: the banana warehouse <sil> is in <sil> Avon 
uttl5 : s: yep 

uttl6 : u: um <sil> <click> <sil> two boxcars are in Bath 
uttl7 : s: yep 

Figure 6.9: A sample of TRAINS Conversation Log 

CMC Swhack Internet Relay Chat (IRC) is another example of communication between 
human-human. It scored third. Swhack is a technical-oriented forum which is a publicly 
available IRC channel. It was found that the conversation logs collected in the Swhack IRC 
chat room not only contained human-human conversation, but also human-machine or CAs 
that pretend to be human. An example is shown in Figure 6.10 in which the human 
conversation are speculating the existence of hots within the chat room. Due to the 
limitations of the conversation between the hots, it is likely that the readability score of the 
corpus is affected. 

<perigrin> there are impressionable bots present. 

<Arnia> sbp: my name seems to strike fear in bots 
<Arnia> We're missing a lot of bots at the moment 

<Somebody[]> ok two bots with google inside? where tf can I get this 
<liteonish> Is it just me, or was that 2 bots conversing? 

Figure 6.10: Sample of Swhack IRC Conversation Log 

Considering the human-machine conversation, the LPJudges and LPBots scored the 
Gunning-Fog Index of 3.1 and 3.3 respectively. This is lower as compared to AINI and IM 
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human which scored 3.4 and 3.7 respectively. AINI and IM used more unique words to the 
values of 1,368 and 1,173 respectively. This is higher as compared to the LPJudges and the 
LPBots which have 873 and 996 unique words respectively. In term of the lexical density or 
different words used, the Loebner Prize judges and LBBots have higher lexical density with 
the values of 35.5% and 34.5% respectively. With respect to the average sentence length 
used in the transcript, AINI and LBBots are 6.46 and 5.01 respectively. It means that 
computer programs used shorten sentences as compared to human. The average sentence 
length is 7.85, and 5.58 as used by the IM human user and LBJudges respectively. Some of 
these aspects could be considered as indicators of syntactic difficulty. For instance, the 
longer a sentence is, the heavier the mental load it places on the reader (Bormuth, 1966, cited 
by DuBay [335]). Thus, a longer sentence tends to be more difficult than a shorter one. 
While not obvious, factors such as word frequency and word length are indicative of 
semantic difficulty. According to Zipf’s Law [42], it is easier to understand words that are 
used frequently in a language. Furthermore, the most frequently used words tend to become 
shorter. 

6.5 Summary 

In this chapter, the source of the knowledge base used by AINI and the linguistic features of 
the conversations are examined. A statistical based approach supported by a visualisation 
tool is used to depict the common communication characteristics of human-machine 
conversation in IM. This includes the topics of interest; word frequency, use of pronouns and 
contracted words; and text complexity of the messages. Statistically-based text retrieval 
systems (the Log-Likelihood ratio) provided specific indexing methods for quantifying the 
linguistic features. The experiment shows that humans and machines can communicate better 
within an unrestricted domain, and in an unconstrained and natural conversation setting. 
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The experiment suggests that IM conversations display considerable variations between 
human-human and human-machine. From the conversation logs, AINI’s “buddies” seem 
interested in chatting about personal issues, emotion, love, sex, computers, entertainment, etc. 
These dialogue traits comprise 85.7% of the AAA’s knowledge bases. Although common 
sense stimulus-response categories comprise the majority of AINI’s knowledge bases (49%), 
human IM users appear to focus on current everyday life domain knowledge, instead of 
factoid questions. As with human knowledge, AINI’s knowledge also has limitations. 
Obviously, about 9.82% of the total questions asked by human IM users are not contained in 
AINI’s knowledge bases. Instead of empty strings or infinite replies, AINI generated 
dynamic responses using alternate algorithms different from the traditional standard AIML. 
Evidence also suggests that AINI's “buddies” are interested in chatting with a CA just to seek 
information, to seek friendship, to express their emotions, and some just want chat for leisure. 
Thus, AINI may be considered as successful in imitating human conversation through 
human-like artificial intelligence. It appears that AINI's responses are ’’acceptable” to make 
its IM “buddies” to feel a sense of companionship. Hence, the following characteristics 
should be included in the design of CAs. 

• CAs should not only work as specific purpose conversation system with rich special 
knowledge, but should also act as friendly chat companions who may appear to share 
the emotion of the users. 

• CAs should make use of pronouns to simulate humanness and naturalness of the 
conversation. This is because the IM corpus is purely dialogue based and they are not 
written or task-oriented communication. 

• CAs should produce highly readable conversation which should score high in the 
readability test. An example is the use of the Gunning Fox Index to assess samples of 
the dialogues 
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The contributions in this chapter are the identification of the needs to provide improved 
communication, natural language technologies and advances in the interaction between 
humans and conversation systems. CAs should be designed with an attempt to reinforce the 
impression that there is a natural dialogue by using shorter sentences and concrete terms to 
increase the readability. Readability and knowledge correlate closely with measures of 
intelligence between human-human or human-machine communication. It is believed that 
communicators, either humans or machines that have vast bodies of knowledge and linguistic 
features will perform well across a large set of domains of knowledge in the conversations. 
In addition, the use of paralinguistic cues have been observed from the conversation logs 
collected in this experiment and they are discussed in the next chapter. 
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CHAPTER 7 


AN ANALYSIS OF THE PARALINGUISTIC CUES FROM REAL-TIME 
HUMAN-MACHINE INTERACTION 


7.1 Introduction 

The previous chapter looked through the linguistics analysis which contributed to the 
communication and advances in the interaction between humans and conversation systems. 
This chapter widens the scope to take into account other phenomena including paralanguage 
or paralinguistic cues to give a better understanding interactive between human and machine. 
Previous studies on human dialogue showed that paralinguistic cues improve user 
experiences and interactions among participants and build enthusiasm toward participation 
and friendliness in intercultural communication [338], [339]. These phenomena are then 
investigated throughout the thesis. In this chapter, novel exploratory schemes are proposed 
to explore the use of paralinguistic cues in real-time interaction between human and AINI 
conversation system in IM. In this context, the cues are limited to intonations with 
interjections and fillers; abbreviations with acronyms and shorthand, and facial expressions 
with emoticons or smileys which has been reported in [340]. Apart from the assessment of 
the linguistic features as discussed in Chapter 6, the same experiment setting and tools 
described in Section 6.2 are used to examine paralinguistic cues from the same corpus. 

During face to face communication, it was reported that as much as 70% of human-human 
communication is conveyed through paralanguages or paralinguistic cues [341] involving 
multiple combined modalities. This includes voice tone, volume and pitch, and body 
language to emphasise or strengthen the spoken words. Research has shown that 
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paralinguistic cues contribute to the majority of the information conveyed in a conversation 
[128, 342-344]. They can be used to establish social hierarchy, negotiate turn taking, and 
indicate understanding, agreement, and attention [345]. Many paralinguistic cues, especially 
gestures, are culturally biased [346]. People can interpret them in different ways and contexts. 
It was reported that the use of paralinguistic cues has an effect on the time spent and the use 
of verbal expression between friends online [122], Unless using video or teleconferencing 
facilities, delivering cues related to voice, tone, volume, pitch or body gestures in the same 
way as face-to-face communication via a text-based interface is obviously difficult or 
impractical. Hence, many software developers and designers endeavour to create an 
extensible system for animating expressive text, especially in the IM environment 108 . Textual 
communication and broadcasting can be ubiquitous in both online and offline media. Many 
authors use variations of typeface, font colour and size, “smiley” faces and animated icons in 
their text to convey emotions and contexts. To visually represent these expressive textual 
data, researchers and designers usually use colour, size or motion to depict the differences 
[347]. In the area of paralinguistics, most of the research that has been undertaken has looked 
into the implications of non-verbal features or paralinguistic cues in CMC [119], [120] 
amongst humans. Dialogue is not new to computational linguistics but, up to now, it has 
been applied only to very restricted domains. It would be desirable to offer a list of 
paralinguistic characteristics to be used by the users for general domain applications. Work 
along these lines has started, but could use more exploration. Our contribution is to get a 
better understanding of the human-machine conversation at hand and identify paralinguistic 
cues that can be used to build better CAs. This is the objective of the exploration in this 
chapter. 


108 AOL Instant Messenger (AIM) http://www.aim.com 
MSN Messenger http:// get.live.com/messenger/ 
Google Talk http://www.google.com/talk 
Yahoo! Messenger http://get.live.com/messenger/ 
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7.2 Paralinguistic Analysis 


Besides the linguistic features [348] discussed in Chapter 6, the understanding of 
paralinguistic cues is also important in spoken dialogue systems, especially in non-verbal 
communication using grunt-like utterances such as “eh”, “ah”, and “um” [349], facial 
expressions, intonations, abbreviations, prosody markers, and so forth. Such utterances are 
frequently used to express a reaction to the interlocutor’s utterance in a dialogue, and they 
also express some sort of intention, attitude, or emotion. 


1 

ID 

Utterance 


1 

AINI 

Hey ©, nice to meet u. 


2 

U1037 

me too 


3 

AINI 

I'm glad we agree, U1037. 


4 

U1037 

btw, where are u from? 


5 

AINI 

I am originally from Perth, Australia. Now I 
Where are you? 

live in here. 

6 

U1037 

that's good ©. I love Perth too. 


7 

AINI 

Thanks. Next question? 


8 

U1037 

do people usually earn money when they work? 


9 

AINI 

Go for it! Yes for me. 


10 

U1037 

wow... excellent ©© 


11 

AINI 

Thanks for the compliment. Tell me a little 
in smilesmile. 

about your life 

12 

U1037 

no, I prefer yourself 


13 

AINI 

Are you serious? You are not the only one. 
self-esteem is important for robots too. 

Yes having 

14 

U1037 

are you a robot 


15 

AINI 

How did you know I am a machine? 


16 

U1037 

YOU JUST TOLD ME 


17 

AINI 

I'm just checking to see if you are awake 


18 

U1037 

DON'T TEST ME 


19 

AINI 

I don't have any reason to do that to you. 


20 

U1037 

Wanna gooooooooo 


21 

AINI 

OK, I will gooooooooo. 


22 

U1037 

cu ® 


23 

AINI 

TTYL, U1037 



Figure 7.1: A typical Single Session Conversation between AINI and user U1037 


The excerpt in Figure 7.1 was taken from a single session IM conversation, exchanged 
between AINI and “her” buddy (ID U1037). The exchange is populated with paralinguistic 
cues. Each of the new sessions will start with AINI sending a greeting (message #1) eg. 
“Hey ©”, “nice to meet u”. These greetings will indirectly garner some information about 
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the user’s identity, such as their name (in this experiment, the user name was replaced by 
user ID U1037 (utterance #3) and the user’s favorite place (utterance # 6), which turned out 
to be same as the city where AINI is located. In the utterance #4, AINI’s buddy used two 
shorthand expressions: “ btw ” (by the way) and “w” (you). In addition, the user also liked to 
use prosody markers, such as pauses, and emphasised the expression by duplicating the 
emoticons (utterance #10). Since AINI’s knowledge is not equipped with full IM features 
such as prosody markers and emoticons, AINI failed to recognize user input with double 
smiles emoticons which refer to “: ) :) ”. This drawback led to AINI generating a 

dynamic response such as “Tell me a little about your life in smilesmile”. However, AINI 
could recognise a single smiley, as appears in utterances #6 and #22. Here, AINI was 
able to understand the queries and amused the user. 

At the end of the conversation session, both participants sent similar farewell greeting to one 
another in utterance #20 to #23. Although the user used the United State (US) accent 
“wanna” followed by the shorthand “cu” in utterances #20 and #22, AINI was still able to 
distinguish the intended meaning and rep bed with simple abbreviation “ TTYL” ( Talk to you 
later)”. It is interesting, that in utterances #20 and #21, AINI and the user reduplicated letters 
“gooooooooo“ to represent their expressive intonation. These transmissions constitute a 
single session and also include the use of shorthand, emoticons, prosody, loose grammar and 
minimal punctuation in IM as shown in the previous research [297, 306, 325, 326]. 

7.2.1 Intonations with Interjections and Fillers 

Interjection is expressions with a single or few syllables. Interjections are short exclamations 
like oh, um or hi. They have no real grammatical value but they are used quite often and 
more frequently in speech than in writing. When interjections are inserted into a sentence, 
they have no grammatical connection to the sentence. Most interjections are reflection of the 
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characteristic in everyday conversation than formal or public ‘task-oriented’ speeches [329]. 
On the other hand, fillers are expressions such as er and um. They are also known as 
"hesitation intonations". Both interjections and fillers are very common in conversations in 
almost every culture and languages [350-352], 


Table 7.1: Log likelihood Ratio of Interjections and Fillers 


Word 

CONV 

BNC 

LL 

TOS 

AINI 

Instant Messaging 
LL 

Human 

Yeah 

13955 

+32679.5 

3741 

15 

-23.97 

37 

Oh 

9884 

+33062.1 

1746 

11 

-13.7 

24 

No 

7830 

+18948.4 

2034 

8 

-11.86 

19 

Er 

5075 

-10677 

10913 

0 

-11.72 

6 

Mm 

5202 

+9146.9 

1768 

0 

-15.63 

8 

Yes 

4247 

+303.0 

3562 

71 

-5.82 

25 

Erm 

3946 

-5387.6 

7454 

0 

-7.81 

4 

Mhm 

392 

-1158.2 

947 

0 

-3.91 

2 

Hello 

392 

+939.5 

103 

24 

+0.10 

13 

Hi 

73 

+250.7 

12 

21 

+0.15 

11 

Um 

7 

-127.5 

41 

0 

-5.86 

3 


CONV : Frequency (per million words) in demographically sampled speech (conversation) 

LL : Log likelihood of the different scores for British National Corpus (BNC) conversation vs 
task-oriented speech and IM (AINI vs human) 

TOS : Frequency (per million words) in context-governed ( task- oriented speech) 


Frequency 


r 80 

-70 

-60 



□ AINI ■ Human 


Figure 7.2: Frequency list of Interjections and Fillers in IM Conversation Agents 
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In the corpus collected in this experiment, it was noted that the use of voice hesitation fillers 
such as er and erm are more often in IM as compared to other spoken or written conversation. 
Another category is the discourse markers such as mhm and um. However, they are rarely 
used and they are collectively considered in the analysis. Table 7.1 and Figure 7.2 also show 
that these hesitations are used far more frequently by the IM human users than by AINI. 
AINI also uses fewer interjections and prefers more formal clause structure. These results 
showed that AINI’s buddies used vague and uncertain markers to express uncertainty or 
hesitation. For example, hesitation fillers were typically stranded in the conversation, leaving 
it to the hearer (AINI) to complete the message. Base on the availability of the knowledge 
bases, AINI sometime makes use of interjections as fillers when “no good match answer” is 
found from the stimulus-response knowledge base. However, it appears that AINI bias 
towards the use of formal conversation such as hi (LL: +0.15) and hello (LL: +0.10). This 
can also be visualised in Figure 7.3 with one-word frequency. In this figure, conversations 
contain many one-word transmissions (12%) between AINI with ‘her’ human buddies. In the 
subject-verb agreement, AINI seems more interested in using the formal speech yes instead 
of yeah, which is shown in Figure 7.2. However, from the human-human transcript in the 
BNC corpus, task oriented speech (TOS) prefers to use yeah (LL: +32679.5) instead of yes 
(LL: +303.0). 
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Figure 7.3: Visualisation of the Paralinguistic Cues Populated in IM Conversation 
Agents based on One-word Frequency 


7.2.2 Abbreviations with Acronyms and Shorthand 

One of the exceptional features in IM is the use of creative forms of abbreviations such as 
acronyms, shorthand and phonetic substitutions as observed by many researchers [295], 
[353], [297, 325, 326], [354]. In English language, the widespread use of acronyms, 
initialisms, and contractions is a relatively new paralinguistic phenomenon, having become 
highly popular in the 20 th and 21 st centuries. With the advancement of science and 
technology, many new complex terms and concepts have been created. The use of 
abbreviated terms has become increasingly widespread in everyday textual and verbal 
communications. Shorthand is used predominantly to speed up communications via IM. 
Abbreviation has become one of the most common methods of new word formation [355]. 
Sali Tagliamonte and Derek Denis [326] studied the IM techniques and behaviours of over 
70 Toronto teenagers and found that IM is actually closer to a written version of normal 
conversation than to writing a letter or email. 
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Acronyms are also used extensively in online chat conversations. These short forms can be 


used to save time and to communicate specific meanings. For example, the abbreviation POS 
(Parent Over Shoulder) might be used to inform the chat buddy that the parent just walked in. 
Traditionally, acronyms and abbreviations are shortened versions of long compound words, 
technical terms or titles. For example, WWW stands for ‘World Wide Web’. 

This study shows that such shortened word forms in text messages also occur in human- 
machine conversation in MSN Messenger. Since most AINI’s buddies are not concerned 
about grammatical conventions when typing instant messages, they use abbreviations, 
acronyms and phonetic substitutions extensively. Examples of such phenomenon are: “How 
are you doing?” becomes “how u doin?”, and “before” is replaced by “b4”. This helps to 
speed up the typed conversation among the buddies online. 


Table 7.2: Short forms used in IM Conversation Agents 


1 Shorthand 

Expression 

Freq 

Shorthand 

Expression 

Freq | 

U/u 

You 

43 

ru 

are you 

17 

k 

Ok 

37 

coz 

because 

17 

btw 

by the way 

35 

nt 

no thanks 

15 

Ic 

I see 

29 

oic 

oh I see 

13 

bb 

bye bye 

28 

B4 

before 

12 

bfn 

bye for now 

25 

plz 

please 

10 

r 

Are 

24 

dunno 

Don't know 

9 

HT 

Hi there 

22 

s 

see 

8 

msg 

Message 

20 

wc 

welcome 

8 

CU 

see you 

19 

Y? 

why 

7 

gotta 

got to 

18 

wanna 

want to 

6 

lol 

Laughing out loud 

18 

unidentified 


18 


From the Chatlog, a wide variety of acronyms and abbreviations have been used. 422 unique 
short forms were found in the 3,280 messages. U/u (“you”) was the most commonly used in 
the IM conversation system. Table 7.2 shows a sample of the 422 short forms observed and 
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recorded during the study. 18 were labelled as “ unidentified ” because the meanings of them 


were unknown. These results differ from the OmniPod survey [356] based on IM use in the 
business environment. This is because this study was not restricted to a workplace 
environment, but rather based on open discussions in an unrestricted domain. 


It was observed that the majority of the expressions in the logs are not acronyms, but rather 
shorthand. An example of words being shortened to emulate their spoken forms in the US 
accent includes the suffixation of ‘a’ to replace the prepositional ‘to’/‘of’ or ‘you’ is shown 
in the following example: 


Want to f wanna 

got to f gotta 

Going to f gonna 

I’ve got you (I understand) f gotcha 

Sort of f sorta 


Interestingly, both the US accented “wanna " and “gotta" were also found in the human- 
machine communication. It does, however, stress that they are ‘non-standard and should 
generally be avoided in both speech and writing. This is indicative of the informality of their 
online speech - “they are chatting”. 

In short, the data shows that paralinguistic abbreviations serve everyday needs, not just for 
human-human conversation but also appear in human-machine conversation. It could be said 
that the use of paralinguistic cues adds richness to the medium of communication. Most IM 
human users use shortened forms in their initial communications, and then change to more 
formal typed messages after a while when they realised that AINI appeared not able to 
understand the questions. Although AIM’s knowledge bases have limited abbreviations 
from the domain-specific and AAA knowledge bases, such as SARS, H5N1, AIDS, IT 
0 information technology), USA ( United of America), NY {New York), CU {see you), TTYL 
{talk to you later), etc., these shortened forms are not the regular forms used in the IM 
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conversation. Hence, in the future, it is proposed that paralinguistic properties should be 
included in CA knowledge bases to provide more human-like interaction. 

7.2.3 Facial Expressions with Emoticons or Smileys 

Non-verbal expressions, such as emoticons or smileys, play another important role since the 
birth of CMC. An emoticon is a little picture created from plain text while larger pictures 
created from plain text are known as ASCII Art. An emoticon is a sequence of ordinary 
printable characters, such as " A _ A ", "XD", "X8", "=D", "=p", etc. or a small 

image, intended to represent a human facial expression and convey an emotion. In order to 
compensate for the lack of the parallel communicative channel provided by face-to-face 
interaction, IM's users have, over the course of time, developed and widely adopted what 
Werry [357] terms “orthographic strategies” to provide mechanisms of reflecting such 
elements as marked facial expressions and intonation. Studies on emoticons report positive 
effects on CMC [305, 324, 339, 358]. Those studies indicate that emoticons improve user 
experiences and interactions among participants and build enthusiasm toward participation 
and friendliness in intercultural communication [338], [339]. Emoticons are a form of 
paralinguistic cues commonly used as extended interpunction symbols in e-mail, IM, and 
online chat. In the case of a smile the colon represents the eyes, the dash 

“ represents the nose, and the right parenthesis “) ” represents the mouth. To express 
emotions or facial expressions, there is an established IM ‘lingo’ of ‘smileys’ that are 
universally understood and used by the conversation agents. An example of the presentation 
of an emoticon is shown in Figure 7.4. 
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Photo 

Specific^ 



Emoticon 
Universal 


Figure 7.4: Universal expression from Human to Computer Communication by 

Scott McCloud [359] 

Emoticons :) or : -) are symbols that represent happiness. Likewise, the 
symbols : ( or : - ( represent unhappiness. However, IM users have created their own 
emoticons to convey other feelings such as surprise, confusion, and anger to enrich their 
communications. There are different combinations that have been used during the 
conversation as shown in the Table 7.3 and Figure 7.5. 


Table 7.3: Facial expressions with Emoticons or Smileys used in IM Conversation 

Agents 


Smiley 

Emoticon 

Meaning 

Freq 

:) 

0 W . 

W 

Happy 

38 

:( 

ly 

Sad 

24 

:0 

ofi 

Surprised 

15 

:'( 

T-' 

Crying 

9 

:D 

y w-. 

Big smile 

15 

:*) 

<?> 

Kiss 

17 

:@ 


Angry 

13 

Z 

> 

Boy 

15 

X 

* 

Girl 

13 

F 


Love 

27 


l 
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Figure 7.5: Frequency of Smiley and Emoticons used in the IM Conversation Agents 

The excessive use of emoticons by AINI’s buddies could cause some potential problems. 
First, MSN limits the number of emoticon images per IM. Second, the readability of 
sentences based on too little words and too many images turns out to be reduced. However, 
since emoticons offer shortcuts in communication, it is almost certain that a few will be 
included in the IM messages. 

On the other hand, it was surprising to note that only a few emoticons appeared in the entire 
session (see visualisation expression in Figure 7.3). Out of the 186 emoticons used, 38 of 
them (20%) were “smileys”. Moreover, a small number of IM users were responsible for 
using the majority of the 186 emoticons. This pattern is consistent with Walther and 
D’Addario’s [121] finding. There are also users who like to emphasise the emotion in their 
queries by duplicating smiley's mouth, for example: 

:))))))) $ hearty laughter 

:(((((((( £ deep sadness. 
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7.2.4 ASCII Arts 


As the network bandwidth increases, many text-based IM clients provide a full range of 
emotional modifiers from tiny emoticons to enormous animations across the entire chat 
space. Excessive use of these features may ruin the simplicity of text. One of the major 
visual design issues is concerned with text-based interface. It is related to the question of 
how to maintain a consistent and coherent typographic layout, yet at the same time to 
provide a secondary channel for communicating paralinguistic cues. Figure 7.6 shows 
examples of the paralinguistic cues (ASCII arts) used by one of the AINI’s buddies (user ID 
U1049) outweighing the meaning of words used in the corpus. In utterances #1 and #7, the 
ASCII art represents “love” and utterances #5 and #11 represents “health”. 


f 

ID 

Utterance 

1 

Continue from previous conversation. 

1 

U1049 

(__) 








/-v 

* II—-II 


2 

AINI 

Is that a sign? 


3 

U1049 

How u think? 


4 

AINI 

My mind utilizes a sophisticated 

computer algorithm. 

5 

U1049 

/ o)\ 




\ (o / 


6 

AINI 

Oh. 


7 

U1049 

((_)) ((_)) ((_)) ((_)) ((_)) 

II II II II II 


8 

AINI 

I don't think very many people do 

that. 

9 

U1049 

Are u sure 


10 

AINI 

I am as sure as I can be. 


11 

U1049 

,_o 




(*)/’(*) 


12 

AINI 

I only hear that type of response 

less than five 


percent of the time 


Conversation continues... 


Figure 7.6: ASCII Arts used in IM Conversation Agents 
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ASCII Arts consist of larger pictures drawn with plain text. Such pictures do not have to be 
limited to just ASCII characters. These "hand-crafted" ASCII sequences were not computer 
generated. The users require extra effort on each word to modify its typography in order to 
convey complex emotions even in a short sentence as shown in the example. This 
typographic approach attempts to convey meaning or message which may not be easily 
achieved by the simple emoticons. In this study, surprisingly only one of AINI’s buddies, 
U1049, tried to use this approach to converse with AINI. Distinctive ASCII arts in the 
conversation to convey “love” or “health” were sent to AINI and shown in Figure 7.6. 

Since AINI’s knowledge bases are not currently equipped to interpret ASCII Arts, AINI 
normally will generate dynamic responses to queries using this type of typographic approach. 
Typical responses are “is that a sign”, “I don’t think very many people do that”, “oh” and 
“I only hear that type of response less than five percent of the time”. As a result of this, 
AINI’s buddies may become annoyed and irritated as they may perceive AINI to be “rude or 
stupid”. Although AINI can learn new expressions in the future, it is almost impossible for 
CAs to recognise all “hand-crafted” expressions. This illustrates a major deficiency of CA’s 
in their inability to interpret meanings from visual or graphical expressions such as ASCII 
Arts. 

7.2.5 Prosody Markers with Pauses and Voice Pitch 

Prosody refers to aspects of speech such as tone of voice, or inordinately slow or fast speech. 
Punctuation is one type of indicator of prosody in text. However, instant messages are 
characterised by the absence of punctuation as compared to formal written text documents 
[360]. In view of the similarity between IM and spoken communication, prosody can also 
play an important role in IM to help receivers interpret incoming messages. 
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In the corpus collected, voice pitch and prosody have been used to give emphasis and 
expressions in many ways. Duplicated letters are frequently used to represent expressive 
intonation as shown in utterances #20 and #21 in Figure 7.1. More examples are shown in 
Figure 7.7. 


hellllloooooooooooooowwwwwwwwww 

arggggggggggg gggggggggggggggggg 

happpyyyyyyyyyyyyyyy 

WOOP~!!!!!!!!!!!!! 

ooooooooooh 

awwww 

woooooohooooooo 

:)))))) 

sunnnnnnnnnnyyyyyyyyyyyyyyyyyyyyyy 
byeeeee all! 

OH NOOOOOOOOOOOOOOo 

NNNNnnnnnnnoooooooooooooooooooooooooo 

Byeeeeeeeeeeee 

yeeeeeeeeeeeeeha! 

Figure 7.7: Prosody Markers used in IM Conversation Agents 


These prosody markers appear to obscure some of the important characteristics of human- 
machine conversation due to the inability of the CAs to understand their meaning and 
significance. Another phenomenon observed is the use of capital letters. In the corpus 
collected, IM’s human users also use capital letters such as ‘SHOUTING’ or 
‘SCREAMING’, to represent raised voice pitch or volume. This is similar as in the case of 
the use of emails. As shown in Figure 7.1 at the beginning of the conversation, AINI makes a 
greeting to the U1037 user, but when AINI failed to continue the conversation, by 
forwarding repeated questions about “her” identity as a CA or robot in the utterance #16 and 
#18, user U1037 replied with CAPITAL LETTERS. This phenomenon also occurred in the 
Loebner Prize contest. According to L. Hamill [361], the Loebner Prize judges’ gave poor 
ratings of the CA’s performance because the CA could not manage standard conversation 
openers, ignored the judge, changed the subject and replied inappropriately. 
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7.3 Summary 


In this chapter, a study on the use of paralinguistic cues in AINI’s IM exchanges with 65 
buddies was carried out. Based on the corpus collected, it was observed that the usage varied 
between different users, especially with regard to paralinguistic properties. It was noted that 
IM human users at the beginning of the conversation, appeared to treat AINI as human 
buddy and they expressed their emotions through various forms of paralinguistic cues. 

It is observed that CAs’ messages in IM are more machine-like than IM conversations 
between human users. Human buddies frequently employ paralinguistic cues and they tend to 
use more intonations, abbreviations, facial expressions, ASCII Arts, and prosody markers 
than machines. Finally, in comparing IM transcripts between humans and machines, it is 
learnt that IM involving a CA differs significantly from human-human IM conversation. It 
was an objective of this thesis to develop the practical framework CAs which will be able to 
intimately understand conversations but also closely mimics a human conversation. In order 
to achieve this goal, the performance of practical CAs, such paralinguistic features should be 
included into the CA’s knowledge bases in the future as discussed in Chapter Eight. 
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CHAPTER 8 


CONCLUSIONS 


8.1 Introduction 

This thesis has presented the investigation and proposal of a practical framework for the 
development of conversation agents (CAs). In the course of this study, research work has 
included literature review, assessment of the state of problems and, the development of a CA 
architecture called AINI to demonstrate and evaluate the proposed concept through real-time 
experiments. AINI’s knowledge is comprised of both open-domain and domain-specific 
knowledge bases in order to provide “meaningful” interaction with the users. An automatic 
knowledge acquisition tool (AKEA) has also been developed to assist in acquiring 
information in order to build the knowledge bases of the CA. In addition, the research has 
proposed a Web Knowledge Trust Model (WKTM) in order to address the issues on the 
trustworthiness of Internet resources. The study then examined the implications of linguistic 
features and paralinguistic cues used by the IM human users and AINI. This chapter presents 
a summary of the study, its contributions and the limitations of the research. This is followed 
by a discussion on the directions for future work. 

8.1.1 Summary 

This study has focused on the problem concerning the development of a practical framework 
for CAs. The research in this study has provided a better understanding of the system 
requirements and the development of a systematic approach for the construction of 
intelligent CAs based on agent architecture using a modular A-tiered approach. 
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A number of applications and modules have been presented to achieve the research 
objectives of developing a practical framework and improving the interoperability of the CA 
architecture through modular design as illustrated in the AINI framework [112, 113, 117]. 
The Natural Language Understanding and Reasoning (NLUR) module for example was 
developed to undertake full sentence parsing to produce grammatical categories and 
grammatical relationships by extracting pertinent information through the agent’s knowledge, 
such as Noun Phrases and Verb Phrases as shown in the Appendix B. This module is 
extensible through polymorphic inheritance of components and was created based on a 
modular design. 

A multilevel natural language query approach utilising multimodal interface and domain 
knowledge matrix model was also used. In addition to just providing an answer, the above 
approach aims to provide an ability to better understand the conversations and to mimic a 
human conversation while prolonging the exchange between the participants. A Domain 
Knowledge Matrix Model (DKMM) with multiple domain knowledge bases in the CA’s 
architecture aims to enable the CA to provide the most appropriate response to the users. 

The proposed Web Knowledge Trust Model (WKTM) and Automated Knowledge 
Extraction Agent (AKEA) are used to ensure the trustworthiness of the websites used in the 
development of the CA’s domain-specific knowledge bases. By providing a flexible 
knowledge management application module in AKEA, this forms a useful device or tool for 
automatic acquisition of knowledge from different domains. The study demonstrates that the 
use of WKTM and AKEA improves the process of knowledge base development from 
existing online documents, as well as establishing the trustworthiness of the information 
sources. 
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In order to assess the proposed framework, relevant tools have been developed and an 
evaluation of their effectiveness has been carried out to validate the performance of the 
system. An interactive chat visualisation system called VisualChat tool for example has been 
developed and used for text analysis as described in Chapters 6 and 7. VisualChat can be 
used to visualise and analyse the human-machine conversation logs. VisualChat is capable of 
displaying the timeline of several textual conversations simultaneously and enables the 
scrutiny of utterance lengths and specific recurring keywords. VisualChat is also able to 
generate a graphical display that allows comparisons between the features of human and 
machine conversations. 

In the evaluation of the CAs, both laboratory and public experiments with online users in 
real-time have been carried out. The results have shown that the proposed system is 
effective in engaging the users and in attempting to provide the most relevant answers 
instead of random replies. The study observed that CAs and human users share common 
properties in the use of linguistic features and paralinguistic cues. These human-computer 
interactions have been analysed and the findings contribute to the understanding of 
interaction between human and CAs. Results from the analysis will be useful for future 
development of CAs by utilising the commonalities found in this research. 

8.1.2 Contributions 

The contributions made by this study during the course of design, implementation, 
evaluation and analysis of the proposed framework are described as follows: 

• Presentation of a novel approach toward the development of practical CAs that allows 
users to communicate efficiently, flexibly and reliably. The framework developed 
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demonstrates an effective implementation and an exploration of the new paradigm of 
Computer Mediated Conversation (CMC) through CAs with an ability to understand 
conversations and closely mimics a human conversation. 

• Presentation of the proposed framework and its ability to re-use and encapsulate 
expertise such as domain knowledge, natural language query and human-computer 
interface through plug-in components. A modified A-tiered architecture with additional 
channel service and domain service tiers contributes towards the goal of developing 
practical CA frameworks. Such architecture combines the advantages with simplicity, 
scalability, robustness and customisation. 

• Presentation of the anthropomorphic modular design in the CA framework that allows 
users to communicate more effectively through multi channel communication interface 
including web, mobile devices and other network services. 

• Presentation of a unique platform to enable the experimentation with, and analysis of the 
relationship between language and behaviour in the context of human-machine 
conversation. Results shown that top-down natural language query approach was a goal- 
driven approach similar to the way that human processes their language. The system 
utilises a multilevel natural language query approach and it has a higher likelihood of 
providing a better answer than systems which use a single strategy. This multilevel 
natural language query consists of Spell Check, Natural Language Understanding and 
Reasoning (NLUR); FAQChat; Index Search; Pattern Matching and Case Based 
Reasoning; and Supervised Learning by Domain Experts. 
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• Presentation of the Domain Knowledge Matrix Model (DKMM) in CA architecture 
which is capable of encompassing both open domain and domain specific knowledge. 
This will facilitate customisation of CAs and also empower the CAs to generate reliable 
and relevant responses and continue to prolong the conversation. 

• Presentation of the Web Knowledge Trust Model (WKTM) for identifying reputable, 
credible, reliable and accountable websites. This novel approach contributes toward the 
building of CAs domain-specific knowledge from trusted sources. In spite of the use of 
semi-structured data from the World Wide Web (WWW) as the source for knowledge, 
the proposed model achieved an acceptable degree of confidence identifying and 
matching knowledge on the Web. 

8.2 Limitations 

While the design and implementation of the proposed CA has moved closer towards the 
original goal of developing flexible and adaptable CAs, the research also revealed several 
underlying limitations. These limitations are categorised as agent knowledge (domain 
knowledge), agent brain (natural language query) and agent body (human-computer 
interface) issues. They are detailed in the following subsections. 

8.2.1 Agent Knowledge Issues 

The experience from the knowledge extraction for CAs using WKTM has shown that the 
present process is labour intensive. This limitation is caused by the need to define a set of 
seeds, extract the corpus, evaluate the corpus, and to determine the trustworthiness of the 
websites using the proposed quantitative and qualitative assessments. An option to allow for 
a fully automatic approach with the ability to determine the seeds from the extracted 
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knowledge becomes apparent. A proposed solution could be an attempt to identify the seeds 
automatically using a set of heuristics and term expansion methods based on widely used 
sources such as the Word-Net [362] [363] [364] instead of bootstrapping from unstructured 
text. 

In addition to the above limitation, an analysis of the paralinguistic cues has shown that the 
excessive use of emoticons, smileys, intonation, abbreviations, ASCII Art, and prosody 
markers by the CA’s buddies is a problem. This will cause problems in continuing the 
conversation because of the inability of the CA to understand the paralinguistic features. 
Hence, means by which to tackle these paralinguistic properties should be included in CAs’ 
knowledge bases in the future. 

8.2.2 Agent Brain Issues 

Another limitation of the CA system is its lack of true support of anaphora and context in the 
agent brain. Anaphora is references made to previously-named objects. In English, pronouns 
like “I”, “me” “you”, “he", "she", "it", and so on usually refer to previously named people 
or things. Most of the time, CAs will ask the user’s name, and incorporate this into future 
responses. Some CAs also attempts to identify the "topic" of conversation. Simply 
remembering the “name” or any other piece of i nf ormation and storing it in a variable for 
later use does not constitute "understanding". The CA does not have any idea of what is 
going on in any sophisticated sense, nor it has any basis for "reasoning" the words. Pursuing 
a dialogue over a few exchanges will show that a CA really has no clue what it is talking 
about? Some CA developers pretend that their software can really identify the target of an 
anaphoric reference, but most such claims are just keyword-based. With the incorporation of 
additional database tables and intelligent techniques on Natural language understanding, it 
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should be possible to give CAs long-term or “permanent” memory through identification and 
personalisation of the CAs’ buddies. 

8.2.3 Agent Body Issues 

Affect [365] and emotional intelligence [366] play crucial roles in human-computer interface, 
and are thus important components of CAs. A face with many expressions may lead to 
greater engagement, believability and amusement, and the CA could also command more 
attention [130]. Currently, AINI’s multimodal interface in the WebChat module is only 
supported by synthesis of prosodic and lip-sync speech, and co-verbal gestures by 
embodiment of avatar or anthropomorphism. Unfortunately, AINI lacks emotional facial 
expressions due to computational constraints. A solution to this limitation in order to fulfill 
the desired communicative function and to express the current emotional state within the 
conversation is the use of Emotion Recognition Module (ERM). They could be incorporated 
into CA’s framework. The ERM can be implemented by adhering to the XML interfaces at 
the channel service tier in the CA’s architecture and by including appropriate tags in the 
original input. 

8.3 Future Work and Directions 

There are many possible directions in which this research could be continued in the future. 
This study has raised a couple of fundamental questions that are worth further investigation. 
As stated in Chapter 3, the proposed CA framework was designed using modified X-tie red 
architecture which shares common software services that could be accessed from any user 
interface, domain or task. Therefore, the CA could be supported on the web as well as mobile 
services, or other network services. The continual development of CAs could lead to new 
domain applications and can be applied to new and emerging domains and tasks. 
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Today, research on CAs has evolved from software robots to conversation agents existing in 
desktop computers or handheld devices. Another potential domain application might be 
“companion conversation robot (CCR)” [367-371] or “social robot” [340] an autonomous 
mobile robot [372-374] that can interacts and communicates with humans. This application 
requires a conversation software agent equipped with a more complex decision making 
process and embedded with speech recognition technology. This type of CCR may one day 
acts as a ‘partner’ to its users, offering assistance and companionship in domains of 
education, health care, household work, entertainment, and many others [375]. 

It is the author’s hope that this study will contribute towards the continual development of a 
practical and human-like CA for the benefit of the community. It is believed that this work 
has taken some small steps towards this ultimate goal. 
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APPENDIXES 


APPENDIX A: 


ANNOUNCEMENT 


fill 

Read the 
Announcement 
carefully 

14 

Click “NEXT” 
button to continue 

Hi 

Click “CLOSE” 
button to stop the 
study 


Online Consent Form Approved by Human Research Ethics 
Committee (HREC), Murdoch University 
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CONSENT 

STATEMENT 

fill 

Read and understand 
the “Consent 
Statement” carefully 

14 

Click “NEXT” button 
to continue 

fill 

Click “CLOSE” button 
to stop the study 
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CONSENT FORM 

%ii 

If you have read 
and understand the 
“Consent Form” , 
type the CONSENT 
KEY provided 

fill 

Click “NEXT” 
button to continue 

Hi 

Click “CLOSE” 
button to stop the 
study 



The CONSENT KEY is generated 
randomly using CAPTCHA 
algorithm. CAPTCHA are used to + 
prevent spammers or software 
agent from performing actions 
which might be used to make a 
profit on the part of the person 
running a agent. 


T 



t) The Web page you are viewing is trying to close the window. 

is 

Do you want to close this window? 

I..II I N° 1 



226 












































CONSENT FORM 

£4 

If you typed wrong 
CONSENT KEY, you 
have to type a new 
CONSENT KEY 
again. 

£4 

Click “NEXT” button 
to continue 

$4 

Click “CLOSE” 
button to stop the 
study 



New CONSENT KEY 
has been generated. 


v 
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START CHATTING 

Click “START” button 
to start a chat session 

Hi 

Click “CLOSE” button 
to stop the study 

Mi 

Click “HELP” button 
to find out how to add 
AINI in your MSN 
Messenger Contact list 
or check System 
Requirements for this 
study. 




▼ 


When participants 
agreed to add AINI 
as his/her buddy 
for the first time, an 
alert will send to 
the buddy notifying 
them of the 
participation in the 
study. Participants 
are freely to chat 
any topics. During 
chatting session, 
participants are not 
restricted to any 
questions or 

regulations. 


o 


msn 


ft _ n x 


Invite Send Files Video Voice Activities Games 


® JJ 


| To: Aini, 13 users online now 

<ainlbot@hotmail. com > 


© a Aini, 13 users online now is not in your contact 
list. Click here to add him or her now 



SI 


ft 


* S. ^ 8 * i* 

lnv«* Send Hw Vdefl Vos* 4*m*t 


Aim. 13 uteri onlinr now 


A -4 • 4 > L- - 


Aini says: 

© Thanks for 
participating this study. Hi 
©. my name is Aini. Can I 
help you? 


2*4 

JF? 

C iN 



1 




MMMgtiNMtf v li2)Mv 1 —1 V 



600 fTalk cards and 2 i 


boo ilali card* and ? Ipod Nano* to be qrvrrt away* 
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HELP 


14 

To add AINI in your MSN 
Messenger Contact list, 
choose “Click here to add him 
or her now”. 

14 

You are freely to ask any 
questions or discuss any 
topics with AINI 

Hi 

If you are offline, please Sign- 
In to your messenger service 
first. 


Close the 

HELP 

window. 


) http://ainibot.murdoch.edu.au Artificial Intelligent Ne... (x] 



©2005-2008 http://ainibot.murdoch.edu.au A ll rights reserved. 



Done 
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APPENDIX B: An Example of the Full Sentence Parsing using Natural Language 
Understanding and Reasoning (NLUR) Module for a sentence “Bird flu 
did occur in which countries?” 


SENTENCE PARSING 


INPUT 


SENTENCE 0 

Bird flu did occur in which countries? 


OUTPUT OF X-MINIPAR 

( 

E2 (() U * ) 

EO (() fin C E2 ) 

1 (Bird ~ N 2 nn (gov flu)) 

2 (flu ~ N 4 s (gov occur)) 

3 (did do Aux 4 aux (gov occur)) 

4 (occur ~ V EO i (gov fin)) 

E3 (() flu N 4 subj (gov occur) (antecedent 2)) 

5 (in ~ Prep E2 p) 

6 (which ~ Det 7 det (gov country)) 

7 (countries country N 5 pcomp-n (gov in)) 

) 

Visualisation of X-MINIPAR Dependency Graph 



offset 

grammatical 

grammatical 

,. s . head (head 

modifier 


1 

categories 

N 

relationships 

nn 

offset) 

Bird flu (2) 


2 

N 

s 

flu occur (4) 


3 

Aux 

aux 

do occur (4) 


4 

V 

i 

occur fin (EO) 


5 

Prep 

P 

in (E2) 


6 

N 

wh 

which country (7) 


7 

N 

pcomp-n 

country in (5) 


NAMED- 

ENTITY RECOGNITION FOR query (SENTENCE 0 ) 



GRAMMATICAL CATEGORIES & 

IN nn 

RELATIONSHIPS 

Bird 

flu (2) 


2 N 

s 

flu 

occur (4) 

INPUT 

3 Aux 

aux 

do 

occur (4) 

4 V 

i 

occur 

fin (EO) 


5 Prep 

P 

in 

(E2) 


6 N 

wh 

which 

country (7] 


7 N 

pcomp-n 

country 

in (5) 
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NOUN PHRASE CHUNKING 

<<(START-OF-HASH) << Bird ( 1 ) <<flu(2) >>do(3) >> occur(4) >> in(5) » which(6) >> country(7) 
>> (END-OF-HASH) » 


<< (START-OF-HASH) <<Bird(l) <<flu(2) «do(3) «occur (4) <<in(5) << which (6) <<country(7) 
>>(END-OF-HASH) » 


Bird(l) is a noun but has already been use in existing NP Bird flu (1,2) . consider no more. 


do (3) is not a noun, consider no more. 


which(6) is a noun and has not been used in existing NP Bird flu (1/2) . considering for new 
NP . 

which(6) is a noun but has already been use in existing NP which country (6, 1 ) . consider no 
more . 


NOUN PHRASE CHUNKING OUTPUT 
2 Bird flu 
7 which country 
CATEGORY ASSIGNMENT 

ASSIGNMENT STARTS: Using the head flu to look for a match in gazetteer 
FIRST PASS FOR flu 
no direct match for flu 
SECOND PASS is carried out 

the head of noun phrase flu cannot be further tokenized. using the entire noun phrase Bird 
flu 

using token Bird, try to find a partial match, 
using token flu. try to find a partial match. 

the token flu has a positive match from a generic (triggering word) : flu. 

trying to fit the token flu to the pattern: (. *\s{TOKEN}). 

instantiating the pattern to obtain (.*\sflu). 

the instantiated pattern matches the noun phrase Bird flu 

attributes identified from this instance 

=> disease_name : Bird flu 

flu is tagged as disease. 

flu produces the following attributes disease_name(X,Bird flu). 

ASSIGNMENT ENDS 


ASSIGNMENT STARTS: Using the head country to look for a match in gazetteer 
FIRST PASS FOR country 
no direct match for country 
SECOND PASS is carried out 

the head of noun phrase country cannot be further tokenized. using the entire noun phrase 

which country 

using token which, try to find a partial match. 

using token country, try to find a partial match. 

no specific categories can be assigned to which country 

which country will assigned with the default category variable(var_desc(X,which country)) 

ASSIGNMENT ENDS 


2 Bird flu disease(disease_name(X,Bird flu)) 

^ which variable(var_desc(X, which 

country country)) 

RELATION INFERENCE FOR query (SENTENCE 0) 

INPUT GRAMMATICAL CATEGORIES & RELATIONSHIPS 


IN nn 

2 N s 

3 Aux aux 

4 V i 

5 Prep p 

6 N wh 


flu 

country 


Bird flu (2) 

flu occur (4) 

do occur (4) 

occur fin (E0) 

in (E2) 

which country (7) 
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7 N 


pcomp-n 


country in (5) 


NOUN PHRASES & THEIR CATEGORIES 

2 Bird flu disease(disease_name(X,Bird flu)) flu 

7 which country variable(var_desc(X,which country)) country 

TYPE 1 (POSSESSION) RELATIONS 

finding for tokens that have gen relation modifying a head 
TYPE 2 (APPOSITIVE) RELATIONS 

finding for tokens that have appo relation modifying a head 
TYPE 3 (SUBJ-V-OBJ) RELATIONS 

finding for a pair of tokens that have subj and obj relation respectively modifying a head 

verb 

a verb do occur was found, looking for its subject and object 

a subject Bird flu and object invisible was found modifying the head verb do occur, 
constructing a triple relation. 

Bird flu do occur invisible 

TYPE 4 (PREPOSITIONAL MODIFIER) RELATIONS 

finding for a preposition and its direct object that modifies either a noun or a verb 
found the preposition in and the N invisible it modifies, the direct object of the 
preposition is which country, constructing a triple relation. 

invisible in which country 

DISCOURSE INTEGRATION FOR query 
POSSESSIVE RELATIONS 
APPOSITIVE RELATIONS 
SUBJ-B-OBJ RELATIONS 

Bird flu (0.2) do occur (0.4) invisible (0.0) 

PREPOSITIONAL PHRASES 

INPUT 

invisible (0.E2) in (0.5) which country (0.7) 

ENTITY OBJECTS 

0.0 entity(desc(X,invisible)) 

0.2 disease(disease_name(X, Bird flu)) 

0.7 variable(var_desc(X,which country)) 

INSTANTIATION OF ENTITY CLASSES AND THEIR ATTRIBUTES 
instantiating a new entity class entity(X) 

a new object entity(1519e762376c94457b985c9e86edflab) is obtained with the following 
attributes: 

desc (1519e762376c94457b985c9e86edflab / invisible) 
instantiating a new entity class disease(X) 

a new object disease(43e7efee7c58619c29fla2b86f857ae3) is obtained with the following 
attributes: 

disease_name(43e7efee7c58619c29fIa2b86f857ae3, Bird flu) 
instantiating a new entity class variable(X) 

a new object variable(cf97ecfa503ad868c05bc035808b04cb) is obtained with the following 
attributes: 

var_desc(cf97ecfa503ad868c05bc035808b04cb,which country) 

TRIGGERS FOR EVENT CLASSES 

attempt triggering using nouns (ONLY FOR QUERIES) 
using the noun invisible(0.0) 
using the noun Bird flu(0.2) 

the noun Bird flu(0.2) was triggered by the trigger word flu[outbreaks] 
the pattern for noun Bird flu(0.2) is no pattern 
using the noun which country(0.7) 
attempt triggering using verbs 
using the verb do occur(0.4) 

the verb do occur(0.4) was triggered by the trigger word occur[outbreaks] 
the pattern for verb do occur(0.4) is {DISEASE}RELATION{LOCATION} 

attempt triggering using prepositions 
using the preposition in(0.5) 

the preposition in(0.5) was triggered by the trigger word in[outbreaks] 
the pattern for preposition in(0.5) is {VARIABLE|DISEASE}RELATION{LOCATION} 

RESOLVING ANAPHORA 
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INSTANTIATION OF EVENT CLASSES AND THEIR ATTRIBUTES 

instantiating a new event class outbreaks to obtain 
outbreaks(lblcc7f086b3f074da452bc3129981eb2) 

instantiating a new event class outbreaks to obtain 
outbreaks(lblcc7f086b3f074da452bc3129981eb2) 

instantiating a new event class outbreaks to obtain 
outbreaks(lblcc7f086b3f074da452bc3129981eb2) 

filling the attributes of event objects 

relation Bird flu(0.2)[outbreaks] has the accompanying instances: this is a noun-triggered 
event, thus, no associated attributes, patterns and maps, 
dynamically creating map 

finding other trigger words of category outbreaks that has in_pattern 
relation do occur(0.4)[outbreaks] has the accompanying instances:{Bird flu(0.2)[disease]}do 
occur{invisible(0.0)[entity]} 

the instances to the relation do occur FULFILLED the pattern {disease}RELATION{location} 
and the map {disease_name}RELATION{the_location} is used to fill the objects to the 
attributes of event object 

disease in the first pattern matches the object disease(0.2) 

so, the object disease(43e7efee7c58619c29fIa2b86f857ae3) will be used to fill attribute 

disease_name 

location in the second pattern matches the object entity (0.0) 

so, the object entity(1519e762376c94457b985c9e86edflab) will be used to fill attribute 

the_location 

attributes-value pair collected so far for the event object 

outbreaks(lblcc7f086b3f074da452bc3129981eb2) 

=> disease_name(Iblcc7f086b3f074da452bc3129981eb2, 43e7efee7c58619c29fIa2b86f857ae3) 

6 the_location(lblcc7f086b3f074da452bc3129981eb2, 

6 1519e762376c94457b985c9e86edflab) 

relation in(0.5)[outbreaks] has the accompanying 

instances:{invisible(0.E2)[variable]}in{which country(0.7)[variable]} 

the instances to the relation in DOES NOT fulfill the pattern 

[variable|disease]RELATION[location] 

checking for inherited attributes among event objects 
removing parent event objects 

EVENTS AND THE RELATED ENTITY OBJECT OBTAINED FROM CURRENT SENTENCES 
outbreaks( lblcc7f086b3f074da452bc3129981eb2 ) 

disease_name( lblcc7f086b3f074da452bc3129981eb2 , 

43e7efee7c58619c29fla2b86f857ae3 ) 

disease ( 4 3e7efee7c58 619c2 9fIa2b8 6f857ae3 ) 

disease_name(43e7efee7c58619c29fIa2b86f857ae3, 

Bird flu) 

outbreaks( lblcc7f086b3f074da452bc3129981eb2 ) 

the_location( lblcc7f086b3f074da452bc3129981eb2 , 

1519e762376c94457b985c9e86edflab ) 

entity( 1519e762376c94457b985c9e86edflab ) 

desc(1519e762376c94457b985c9e86edflab,invisible) 

entity object attributes 2 attributes 

entity objects 2 objects 

event object attributes 2 attributes 

event objects 2 objects 

OUTPUT OF DISCOURSE INTEGRATION: QUERY NETWORK 

'Iblcc7f086b3f074da452bc3129981eb2','is','outbreaks' 

'Iblcc7f086b3f074da452bc3129981eb2’,'disease_name', '43e7efee7c58619c29fIa2b86f857ae3' 

'43e7efee7c58619c29fIa2b86f857ae3','is','disease' 

'43e7efee7c58619c29fIa2b86f857ae3','disease_name','Bird flu' 

'Iblcc7f086b3f074da452bc3129981eb2','is','outbreaks' 

'Iblcc7f086b3f074da452bc3129981eb2','the_location', '1519e762376c94457b985c9e86edflab' 

'1519e762376c94457b985c9e86edflab','is','entity' 

'1519e762376c94457b985c9e86edflab','desc','invisible' 

NETWORK-BASED ADVANCED REASONING 

question: Bird flu did occur in which countries 

NETWORK-TO-PATH REDUCTION 
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from the query network, the set of path sequences Q from the leaf node to the root node is 
obtained: 

path sequence q 2 : Bird flu, disease_name, 43e7efee7c58619c29fIa2b86f857ae3, 
disease_name, lblcc7f086b3f074da452bc3129981eb2, is, outbreaks 

from the query network, the set of path sequence A from the leaf node invisible to the root 
node is obtained: 

path sequence ai: invisible, desc, 1519e762376c94457b985c9e86edflab, the_location, 
lblcc7f 086b3f074da452bc312 998leb2, is, outbreaks 

SELECTIVE PATH MATCHING & RELAXATION OF EVENT CONSTRAINT 

using the semantic network, we try to verify that each path sequence in Q conditionally 
exists in S 

finding the path qi Bird flu, disease_name, 43e7efee7c58619c29fIa2b86f857ae3, 
disease_name, lblcc7f086b3f074da452bc3129981eb2, is, outbreaks from semantic network. 
The path is is UNIQUE 

MATCHED the path bird flu, disease_name, 43e7efee7c58619c29fIa2b86f857ae3, 
disease_name, 6a8219886c22af2e6f91a0253fe70dle2, is, outbreaks 


MATCHED 

the 

path 

bird flu, 

disease name, 

43e7efee7c58619c29fla2b86f857ae3, 

disease 

name 

83dab3 67c7387e24b731198a990 998321, 

is, 

outbreaks 

MATCHED 

the 

path 

bird flu, 

disease name, 

43e7efee7c58619c29fla2b86f857ae3, 

disease 

.name 

i, a567fc0ef49e0d47316c0fc6de23de7 65, 

is, 

outbreaks 

MATCHED 

the 

path 

bird flu, 

disease name, 

43e7efee7c58619c29fla2b86f857ae3, 

disease 

.name 

f 9cal7 6513eb7edle6a73c7 80aa9818 63, 

is, 

outbreaks 


potential UNIQUE event f9cal76513eb7edle6a73c780aa981863 from semantic network that appears 
in at least one path sequence 

using the semantic network, proceed to find the value for invisible leaf node in set A 
of master event f9cal76513eb7edle6a73c780aa981863 

the event f9cal76513eb7edle6a73c780aa981863 appears in 1 out of the 1 paths in Q and it 

is a UNIQUE. 

finding the path invisible, desc, 1519e762376c94457b985c9e86edflab, the_location, 
lblcc7f086b3f074da452bc3129981eb2, is, outbreaks 

MATCHED the path Malaysia, country(2), 756e795131dlefac6307all7aa7ec8ec, 
the_location, f9cal76513eb7edle6a73c780aa981863, is, outbreaks 

potential UNIQUE event 83dab367c7387e24b731198a990998321 from semantic network that appears 
in at least one path sequence q± 

using the semantic network, proceed to find the value for invisible leaf node in set A 
of master event 83dab367c7387e24b731198a990998321 

the event 83dab367c7387e24b731198a990998321 appears in 1 out of the 1 paths in Q and it 

is a UNIQUE. 

finding the path invisible, desc, 1519e762376c94457b985c9e86edflab, the_location, 
lblcc7f086b3f074da452bc3129981eb2, is, outbreaks 

MATCHED the path Germany, country(2), 2125c467080e7396e8e66d5dafc44196, 
the_location, 83dab367c7387e24b731198a990998321, is, outbreaks 

potential UNIQUE event a567fc0ef49e0d47316c0fc6de23de765 from semantic network that appears 
in at least one path sequence q± 

using the semantic network, proceed to find the value for invisible leaf node in set A 
of master event a567fc0ef49e0d47316c0fc6de23de765 

the event a567fc0ef49e0d47316c0fc6de23de765 appears in 1 out of the 1 paths in Q and it 

is a UNIQUE. 

finding the path invisible, desc, 1519e762376c94457b985c9e86edflab, the_location, 
Iblcc7f08 6b3 f 07 4da4 5 2bc312 9 9 81eb2, is, outbreaks 

MATCHED the path Thailand, country(2), 2dl2b535baf0916403a5d98dfe23fb6e, 
the_location, a567fc0ef49e0d47316c0fc6de23de765, is, outbreaks 

potential UNIQUE event 6a8219886c22af2e6f91a0253fe70dle2 from semantic network that appears 
in at least one path sequence q± 

using the semantic network, proceed to find the value for invisible leaf node in set A 
of master event 6a8219886c22af2e6f91a0253fe70dle2 

the event 6a8219886c22af2e6f91a0253fe70dle2 appears in 1 out of the 1 paths in Q and it 

is a UNIQUE. 

finding the path invisible, desc, 1519e762376c94457b985c9e86edflab, the_location, 
lblcc7f086b3f074da452bc3129981eb2, is, outbreaks 

MATCHED the path Vietnam, country(2), b394edf1054b38ccee98af4663785a99, 
the_location, 6a8219886c22af2e6f91a0253fe70dle2, is, outbreaks 

4 answer(s) were discovered: Malaysia, German, Vietnam, Thailand 

Note: 
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The query results 109 found four answers (Malaysia, German, Vietnam and Thailand) from 
four differences trustworthiness websites extracted using AKEA as follows: 

1. German - www.pandemicflu. gov/general/workshopmornin g. html 

MATCHED the path Germany, country(2), 2125c467080e7396e8e66d5dafc44196, the location, 
83dab367c7387e24b731198a990998321, is, outbreaks 

2. Malaysia - www.whitehouse.gov/news/releases/2005/11/20051101-1 .html 
MATCHED the path Malaysia, countrv(2), 756e795131dlefac6307all7aa7ec8ec, the location, 
f9cal76513eb7edle6a73c780aa981863, is, outbreaks 

3. Vietnam - www.who.int/csr/disease/avian influenza/avian faqs/en/index.html 
MATCHED the path Vietnam, countrv(2~), b394edfl054b38ccee98af4663785a99, the location, 
6a8219886c22af2g6f91a0253fe70dle2 1 is, outbreaks 

4. Thailand www2a.cdc.gov/HAN/Archivesvs/ViewMsgV.asp?AlertNum=00221 
MATCHED the path Thailand, country(2), 2dl2b535baf0916403a5d98dfe23fb6e, the location, 
a567fc0ef49e0d47316c0fc6de23de765, is, outbreaks 


109 Execution time for the query is 2.03 seconds on Dell Precision PWS380 Server 3GH with 1GB of memory 
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APPENDIX C: Google PageRank™ Checksum Calculator 110 - Assigning 
Numerical Weightings to Hyperlinked Documents 
Indexed by a Google Search Engine 


/* Google PageRank™ Checksum Calculator */ 
header( "Content-Type: text/plain; charset=utf-8") ; 

define('GOOGLE_MAGIC', 0xE6359A60); //Define the named constant "GOOGLE_MAGIC 

//unsigned shift right 
function zeroFill($a, $b) 

{ 

$z = hexdec(80000000) ; 
if ($z & $a) 

{ 

$a = ($a»l); 

$a &= (~$z) ; 

$a |= 0x40000000; 

$a = ($a» ($b-l) ) ; 


else 


$a = ($a»$b); 

} 

return $a; 


function mix ($a,$b, $c) { // Google Checksum calculation 


$a 

-= $b; 

$a 

-= $c; 

$a 

A = ( zeroFill ($c, 13) ) 

$b 

-= $c; 

$b 

-= $a; 

$b 

A = ($a«8); 

$c 

-= $a; 

$c 

-= $b; 

$c 

A = (zeroFill ($b, 13) ) 

$a 

-= $b; 

$a 

-= $c; 

$a 

A = (zeroFill ($c, 12) ) 

$b 

-= $c; 

$b 

-= $a; 

$b 

A = ($a«16); 

$c 

-= $a; 

$c 

-= $b; 

$c 

A = (zeroFill ($b, 5)); 

$a 

-= $b; 

$a 

-= $c; 

$a 

A = (zeroFill ($c,3)); 

$b 

-= $c; 

$b 

-= $a; 

$b 

A = ($a«10); 

$c 

-= $a; 

$c 

-= $b; 

$c 

A = (zeroFill ($b, 15) ) 


return array ($a,$b,$c) ; 

} 


//Calculate the Google Checksum from given URL 
function GoogleCH ($url, $length=null, $init=GOOGLE_MAGIC) { 
if (is_null($length) ) { 

$length = sizeof ($url) ; 

} 

$a = $b = 0x9E377 9B9 ; 

$c = $init; 

$k = 0; 

$len = $length; 
while ($len >= 12) { 

$a += ($url[$k+0]+ ($url [$k+l]<<8)+ ($url [$k+2]<<16)+ ($url [$k+3]<<24) ); 
$b += ($url [$k+4]+($url[$k+5]<<8) + ($url[$k+6] «16) + ($url[$k+7]<<2 4 )) ; 
$c += ( $url [ $k+8 ] + ( $url [ $k+9] «8) + ($url [$k+10] «16) + ($url [$k+ll ] «24) ) 
$mix = mix ($a, $b, $c) ; 

$a = $mix[0]; $b = $mix[l]; $c = $mix[2]; 

$k += 12; 

$len -= 12; 

} 


$c += $length; 


110 Google PageRank™ Checksum Calculator script is originally from 

http://www.googlecommunitv.com/scripts/google-pagerank.php and was modified and hosted at 
http://ainibot.org/pagerank 







switch ($len) /* all the case statements fall through */ 


case 11: $c+= ($url[$k+10 ] <<24 ); 
case 10: $c+= ($url [$k+9] «16) ; 
case 9 : $c+= ($url [$k+8] «8) ; 


/* the first byte of c is reserved for the length 


case 

case 

case 

case 

case 

case 

case 

case 


$b+= ($url [$k+7] «24) ; 
$b+=($url[$k+6] « 16) ; 
$b+= ($url [$k+5] «8) ; 
$b+=($url [ $k+4 ]); 

$a+= ($url [$k+3] «24) ; 
$a+= ($url [$k+2] «16) ; 
$a+= ($url [$k+l] «8) ; 
$a+=($url [$k+0] ); 


/* case 0: nothing left to add */ 


*/ 


$mix = mix ($a,$b,$c) ; 
/* report the result */ 
return $mix[2]; 

} 


//converts a string into an array of integers containing the numeric value of the 
char 

function strord($string) { 

for ($i=0; $i<strlen($string) ; $i++) { 

$result[$i] = ord ( $string{$i }); 

} 

return $ result; 

} 


//This is the function used to get the PageRank value. 

function getrank ($url, $pref ix=" info:" , $datacenter=" www.google. com " ) { 

$url = $prefix . $url ; 

$ch = GoogleCH(strord($url) ); 

$file = "http://$datacenter/search?client=navclient- 
auto&ch=6$ch&features=Rank&q=$url" ; 

$oldlevel = error_reporting ( 0) ; 

$data = file($file); 
error_reporting ( $oldlevel) ; 

if(! $data || preg_match ("/(.*)\.(.*)/i" , $url)==0) return "N/A"; 
$rankarray = explode $data[2]); 

$rank = trim($rankarray [2]) ; 
if ($rank==" " ) return "N/A"; 
return $rank; 

} 


// datacenter eg. from " www.google.com ", "216.239.53.99", "66.102.11.99". 
function getrealrank ($url, $datacenter=" www.google .com " ){ 

$ch = GoogleCH(strord($url) ); 

$array = xmltoarray ( "http://$datacenter/search?client=navclient- 
auto&ch=6$ch&q=$url" ); 

$infoarray = $array [ 'GSP' ] [0] [ 'RES' ] [0] [ 'R' ]; 

$nonwwwurl = str_replace ( "www." , $url) ; 

$urlpermutationsarray = array ($url, "http://" . $url, "http://www.". $url, $url."/ 
", "http://" . $url . "/" , "http://www.". $url ."/", $nonwwwurl, "http://" . $nonwwwurl, "h 
ttp://www." . Snonwwwurl, $nonwwwurl . "/" , "http://". Snonwwwurl. "/" , "http://www.". $no 
nwwwur1 . "/" ); 

for($i=0; $i<count ( $infoarray) ; $i++) { 

$urlU = $infoarray[$i] [ 'U' ]; 

$urlrank = $infoarray [$i] [ 'RK' ]; 

foreach( $urlpermutationsarray as $permutation) { 

if ( strtolower ( $permutation)==strtolower ( $urlU) ) return $urlrank ; 

} 

} 

return "Unknown"; //If no matches found. 

} 
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